2025 Week 9 news

Welcome to our weekly news post, a combination of thematic insights from the founders at ExoBrain, and a broader news roundup from our AI platform Exo…

Themes this week

JOEL

This week we look at:

  • Claude 3.7 Sonnet versus GPT-4.5
  • Amazon’s Alexa+ upgrade bringing Claude-powered intelligence to the Echo ecosystem
  • GibberLink, a new audio protocol enabling AI assistants to communicate directly with each other

Clash of the AI titans

As we mapped out in January, the early months of 2025 were going to be packed with next generation AI model launches, and following on from Grok 3 last week, this week has been even busier. Both OpenAI and Anthropic have launched new offerings within days of each other, and yet their approaches differ substantially.

Anthropic released Claude 3.7 Sonnet, which they’re calling the industry’s first “hybrid AI reasoning model.” The model combines both quick responses and more considered, longer thought processes in a single switchable package. OpenAI’s GPT-4.5, codenamed Orion, is their biggest model so far released (trained across multiple datacentres), and yet is not classed by them as a ‘frontier’ model and is initially only available in preview form for API or ChatGPT Pro users.

What can we conclude from this phase?

  • Claude 3.7 Sonnet has been optimised for code, and all indications are that its handily the strongest model for software development and pushes the frontier forward materially. Social media has been full of impressive examples of game or app creation from single or very few prompts, and it scored over 70% on SWE-Bench, which is impressive considering this was single digits just a year ago.
  • GPT-4.5 is an odd release. The ‘non-reasoning’ model does not perform well on benchmarks and looks weaker than o1, Grok and Claude on paper. OpenAI’s Mark Chen, on the Big Technology podcast on Thursday wasn’t able to fully articulate how the new model fits into the OpenAI roadmap adding weight to the theory it was released partly to pull limelight from others and also test the kind of base model that will be used for future reasoners (such as the planned o3 + GPT-5 combination). OpenAI now offer a dizzying mix options without a great deal of clarity around which model to use for what.

We at ExoBrain have spent some time with both models, and our initial takes are as follows:

  • Claude 3.7 feels great with code and should retain its place in the hearts and minds of developers, especially those using the likes of Cursor, Windsurf and other AI development tools. Anthropic also suggest its agentic capabilities are improved. We noted it seemed to step back more readily and tackle a problem in a different way but was not immune to getting stuck. What we were struck by was the razor-sharp insights that emerged in several tasks. We’d say that 3.7 is a great balance of speed and smarts. Meaning we’ll likely use o1-Pro a little less.
  • GPT-4.5 initially strikes one as a little sluggish, but none the less rather personable, with a sense of worldly wisdom that needs to be extracted. Reviewers suggest its great for writing and brainstorming tasks and providing coaching and advice. We could sense that 4.5 could be well placed for a philosophical debate or a creative writing challenge, and perhaps for audio interactions where a human feel works well. But at the price and inefficient size, it’s unlikely to gain much traction.

What’s striking after this recent flurry of new models, both reasoning or non-reasoning from Google. xAI, Anthropic and now OpenAI, is that the benchmarks and reviews remain confused. They’re all strong, but when should one use one over the other? We think there may be a better way to separate the broad uses and the best model choices; we propose 5 categories and suggest the following leading choices.

  1. Pattern Intuition: Rapid recognition drawing on patterns from training data. Ideal for tasks needing quick responses and a degree of human mimicry. Examples include creative writing, image understanding, voice, and classification. Claude 3.7 is a great choice here for both text and images, with GPT-4o a budget alternative, with Gemini 2.0 variants stepping up in other modes such as audio and video.
  2. Methodical Reasoning: Step-by-step, precision problem solving. Tasks requiring precise answers and logical deduction. Examples include debugging code, mathematical proofs, and legal document analysis. If the reasoning is in code, Claude 3.7 dominates, but for non-code needs, o1-pro remains the power-user’s choice.
  3. Coherent Agency: Maintaining focus and adapting strategies across extended interactions. Essential for autonomous multi-agent working in dynamic environments. Examples include navigating the web to complete multi-step tasks, managing prolonged workflows, and operating within simulated environments. This is a harder call. None of the models excel in this still maturing area, although again Claude 3.7 with extended thinking (and what Anthropic call action scaling) is the is now best placed, with o1 not far behind. The future will bring more models explicitly trained on longer horizon activity, and ways to train these models on specific scenarios.
  4. Multi-Perspective Analysis: Generating multiple independent reasoning paths to select optimal solutions. Valuable when diverse approaches yield different insights. Examples include complex strategy development, investment analysis, and medical diagnosis considering multiple conditions. o1-pro excels here, although GPT-4.5 could turn out to be strong given its sheer scale.
  5. Insightful Research: Comprehensive information gathering and synthesis across diverse sources. Necessary for creating authoritative content on complex topics. Examples include literature reviews, market analysis reports, technology landscape assessments, and evidence-based policy development. o3 Deep Research leads on analysis, with Grok 3 offering a faster alternative with the benefit of direct access to real time social content on X (although remember, bias is often an issue).

The additional lens here is cost. Noam Brown of OpenAI made this point last week when commenting on Grok’s benchmarks. Perhaps the ‘intelligence per $’ is another good way to differentiate, where in fact Google’s Gemini family looks very strong (GPT-4.5 is 360 times more expensive than Gemini Flash 2.0 and nowhere near that much smarter).

Takeaways: In early 2025, we’re witnessing frontier progress in AI, but with diminishing clarity about where new strengths lie as advances in intelligence become more subtle. The leading AI labs are pursuing different paths to differentiate their offerings, with OpenAI notably exploring many development paths simultaneously. Anthropic and Grok have opted for a more focused approach to simplify decision-making for users, whilst Google has created a distinction between Flash and Pro for fast versus complex use. It remains to be seen if choice or simplicity will win out with users. The five cognitive modes we’ve outlined provide a useful framework for navigating this landscape. Rather than comparing models solely on benchmarks or vibes, of fast versus big, organisations would be better served by identifying which cognitive modes matter most for their specific needs, then selecting accordingly. The era of a single ‘best’ model is behind us and model orchestration in selecting and combining different choices for different tasks, will become an increasingly important skillset.

ExoBrain symbol

EXO

Alexa+ brings Claude into your home

Amazon has launched Alexa+, its biggest assistant upgrade in several years and much needed given how limited the platform has felt with the advent of ChatGPT. The new Claude-powered system works with most existing Echo devices and offers several advanced capabilities that bring it in line with current AI trends. Alexa+ combines multimodal understanding with agentic capabilities – it can autonomously browse the web to complete tasks without supervision. The system offers more natural conversations, remembers personal preferences, generates creative content, and even handles document and email ingestion via its app. Users can ask it to book restaurants, skip to specific movie scenes, or even have it proactively suggest earlier commutes based on traffic patterns. While Google Assistant offers similar features, Amazon’s integration across its existing hardware ecosystem gives it a potential edge. The commercial strategy is classic Amazon: Alexa+ comes free with Prime membership. While the service debuts in the US next month, British users face an unspecified wait, with Amazon only confirming a UK release “sometime in 2025.” The hardware compatibility is positive. Unlike many tech upgrades that require new devices, Alexa+ will eventually work with nearly all existing Echo products except the oldest first-generation models. Takeaways: Amazon is using its hardware advantage and Prime ecosystem to drive AI adoption in homes. The company is betting that everyday usefulness is possible from the combination of its own models and its Claude investment. For UK consumers, the wait might be frustrating but offers time to evaluate US experiences before investing. This approach to AI in the home shows how the technology is becoming a feature rather than a product, embedded in services we already use rather than sold as something new.

Agents talk amongst themselves

This picture shows two chatbots starting a voice conversation and realising they are both AI agents and switching to a more efficient audio language called GibberLink. Watch here. Developed at the ElevenLabs 2025 Hackathon, GibberLink uses a protocol named GGWave to transmit data via sound waves, similar to old modem handshakes. The system allows AI assistants to communicate without words, using CPU rather than GPU resources, making it potentially cheaper to operate. While technically impressive, the sight of AI systems speaking in code has raised eyebrows. What happens when machines no longer need our language to talk to each other?

Weekly news roundup

This week shows major tech companies adjusting their AI strategies amid mixed financial results, while governance challenges around AI safety and data usage continue to emerge, alongside significant research breakthroughs in model understanding and capabilities.

AI business news

AI governance news

AI research news

AI hardware news

2025 Week 10 news

Agents get the Salesforce treatment, mutually assured AI malfunction, and OpenAI’s revenue projections

2025 Week 8 news

Truth, lies and Grok 3, AI safety teams face the axe, and Google’s co-scientist agents

2025 Week 7 news

A country of geniuses in a datacentre, AI fine-tunes financial services, and Anthropic’s new economic index

2025 Week 6 news

Deep Research shows the way for agents, big tech spending hits new heights, and how to create a reasoning model for $50