Daily AI Skills
Posts
Windsurf Launches it's SWE Model

Windsurf Launches it's SWE Model

PLUS: OpenAI's Agent Building Guide + Google AlphaEvolve

Daily AI Skills
May 18, 2025

Welcome back to Daily AI Skills.

Here’s what we are covering today:
1. Google Debuts AlphaEvolve
2. Windsurf SWE 1
3. LLMs struggle with Multi-Turn Conversations

+ OpenAI Agent Building Guide

Google Debuts AlphaEvolve

Google has introduced AlphaEvolve, a new coding agent that combines Gemini models with evolutionary algorithms to tackle complex scientific and computational problems, enhancing internal efficiency and cracking long-standing math puzzles.

Here’s how it works:

AlphaEvolve leverages different Gemini models—using Flash for generating ideas and Pro for in-depth analysis—to write code, which is then tested and refined through iterative evolution.
It has already made breakthroughs in mathematics, including the first advancement on Strassen’s 1969 matrix multiplication algorithm.
The system is also being used internally at Google to optimise data centre scheduling, accelerate AI training (even for its own models), and assist with chip design.
In tests on over 50 unsolved math problems, AlphaEvolve matched state-of-the-art solutions in 75% of cases and found entirely new, superior ones in another 20%.

Read the full research paper here

Windsurf Unveils SWE-1: AI for Full-Stack Software Engineering

AI coding platform Windsurf has unveiled SWE-1, its first proprietary family of AI models tailored to support the entire software engineering lifecycle, not just code generation.

Here’s what’s new:

The SWE-1 lineup includes three models: SWE-1 (a full-size version for paid users), SWE-1-lite (now replacing Cascade Base for all users), and SWE-1-mini.
Internal benchmarks show SWE-1 outperforms all non-frontier and open-source models, ranking just below top-tier systems like Claude 3.7 Sonnet.
Unlike traditional code-centric models, SWE-1 is designed to operate across multiple interfaces—editors, terminals, and browsers.
A key innovation is its “flow awareness” system, which maintains a shared timeline between the user and AI, enabling smooth collaboration and task handoffs throughout development.

This is crazy...
Windsurf just dropped SWE-1, a new AI model specifically built for software engineering.
It has the performance of frontier models at a fraction of the cost.
Here's everything you need to know:
— Angry Tom (@AngryTomtweets)
10:57 PM • May 15, 2025

Check out the full announcement here: https://windsurf.com/blog/windsurf-wave-9-swe-1

LLMs Struggle with Multi-Turn Conversations

A new study by researchers from Microsoft and Salesforce reveals that large language models (LLMs) struggle in multi-turn conversations where instructions are revealed gradually, often losing track and failing to recover effectively.

Key findings:

The team evaluated 15 top LLMs—including GPT-4.1, Claude 3.7 Sonnet, and Gemini 2.5 Pro—across six generative tasks.
While models performed well in single-turn interactions (around 90% success), their accuracy dropped to about 60% in longer, multi-step conversations.
LLMs frequently “got lost” by making premature assumptions, acting before collecting enough information, or compounding early mistakes.
Adjusting temperature settings or using different reasoning methods didn’t significantly improve performance—volatility remained high, even among the best models.

Read the full research paper here: https://arxiv.org/pdf/2505.06120

A practical guide to building agents by OpenAI

A Practical Guide to Building Agents

cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf

📩Forward it to people you know who are keeping pace with the changing AI world, and stay tuned for the next edition to stay ahead of the curve!

Maybe you missed out on these: