ExoBrain

ExoBrain Weekly

o3 and the new scaling laws, Claude, your personal AI, an uncertain geopolitical future, a year of disruption, Recurring themes, and Where AI meets the absurd

Welcome to our weekly newsletter, a combination of thematic insights from the founders at ExoBrain, and a broader news roundup from our Exo agents.

This week we look at:

  • o3 and the new scaling laws

    The industry is shifting from training larger models to optimising reasoning at inference, with OpenAI's o3 demonstrating superior performance in coding and complex problem-solving benchmarks.

  • Claude, your personal AI

    Anthropic's Claude models have established themselves as leading personal productivity and safety research tools, intensifying competition with OpenAI while driving advancements in software engineering capabilities.

  • An uncertain geopolitical future

    The article examines the critical geopolitical risks surrounding global AI infrastructure, focusing on the supply chain dependency on TSMC and the tensions between the US and China.

  • A year of disruption

    Klarna’s strategic pivot to replace half its workforce with proprietary AI systems illustrates a broader industry shift from traditional SaaS to bespoke, AI-first operational models.

  • Recurring themes

    The author reflects on recurring themes from 2024, highlighting the challenges of AI adoption, the transformation of SaaS models, and the ethical implications for the workforce.

  • Where AI meets the absurd

    This article explores the unpredictable intersections of AI, culture, and technology through incidents involving autonomous agent vulnerabilities, creator protests, and AI-driven financial speculation.

o3 and the new scaling laws

The industry is shifting from training larger models to optimising reasoning at inference, with OpenAI's o3 demonstrating superior performance in coding and complex problem-solving benchmarks.

Joel Miller

Joel Miller

2 min read
o3 and the new scaling laws

The AI ‘scaling’ story took a significant turn in 2024. Early rumours about OpenAI’s Q* and ‘strawberry’ projects suggested a major leap in AI reasoning capabilities. When o1-preview was unveiled (Week 37), it productionised a new approach, shifting computational resources towards ‘thinking time’ rather than training. This model was designed to leverage reinforcement learning to enhance its reasoning capabilities, allowing it to spend more time processing and solving complex problems. But behind the scenes the costs of training at the frontier had escalated dramatically, approaching $1b per run (Week 34). At the NeurIPS conference last week, former OpenAI chief scientist Ilya Sutskever suggested that we will soon reach ‘peak [training] data’, signalling the end of the scaling era. But right on cue, o1 pro mode (Week 50) and other models like DeepSeek R1 (Week 47), along with Google’s ‘Gemini 2.0 Thinking’, suggest scaling at the point of use can take over.

The future will not only be about training ever larger models, but about teaching smaller ones to think more effectively (and getting them to work together as ‘agents’). The race is on to perfect this technique and optimise beyond the highly structured domains of maths and coding, to healthcare, finance and beyond. As we published on Friday evening, OpenAI demonstrated and published benchmarks for their next generation reasoning model, o3, planned for release in Q1 2025. The early benchmarks look stunning. It has demonstrated above human performance in the Arc AGI Prize (Week 24) and looks very strong on software development. In early 2024 GPT-4 was getting around 3% on the SWE-Bench coding test, and o3 tops 70%! The cost of this capability looks exceptionally high, but the trajectory is clear.

Claude, your personal AI

Anthropic's Claude models have established themselves as leading personal productivity and safety research tools, intensifying competition with OpenAI while driving advancements in software engineering capabilities.

Joel Miller

Joel Miller

2 min read

The arrival of Claude 3 in March was the most significant release in the first half of the year (Week 10). For the previous year, GPT-4 had reigned supreme, and OpenAI seemed relatively unassailable, but Claude’s remarkable self-awareness and ability to process entire books in seconds set a new bar. By autumn, Claude 3.5 Sonnet raised the bar again (Week 26). Its superhuman software engineering abilities created a step-change in what individuals were capable of doing in code and new development tools like Cursor took off. Meanwhile an array of talented safety researchers left OpenAI for Anthropic, with the Golden Gate Claude (Week 21) and now Fake Alignment papers being for many the AI research breakthroughs of the year.

As a result. Anthropic’s main challenge has been how to keep up with demand, with Claude’s only negative being restrictive rate limiting. Competition is as intensive as ever, but Claude is holding its own, and whilst rumours of delays and issues with the next release abound, Amazon continue to pour in money and compute (and even Microsoft are rumoured to be considering investing in the next round). Anthropic end the year as the leading AI lab. Claude, beloved by many, has shown how AI can provide huge personal productivity augmentation, a knowledge working partner, and a platform for vital safety research.

An uncertain geopolitical future

The article examines the critical geopolitical risks surrounding global AI infrastructure, focusing on the supply chain dependency on TSMC and the tensions between the US and China.

Joel Miller

Joel Miller

1 min read

The global rush to build datacentres in 2024 continued despite the unsettling fact that every advanced chip powering our AI future comes from a single source. TSMC’s fabrication plants in Taiwan remain the only facilities capable of producing cutting-edge GPUs, from Nvidia’s powerful Blackwell (Week 12) to custom silicon from every major firm. Whilst the CHIPS Act spurred construction of new US facilities, TSMC’s own Arizona plant faces delays. More concerning still, chips must still return to Taiwan for the critical packaging step, a bottleneck that won’t ease until 2027 at the very earliest. Enter Donald Trump’s election victory, bringing promises of tariffs instead of subsidies, accusations of Taiwan “stealing” US technology, and suggestions they should pay for military protection (Week 45). Meanwhile, China (Week 35), at war with the US over access to advanced silicon watches and waits, knowing that control of Taiwan would grant unprecedented leverage over the global economy.

This technological dependency has become the most significant geopolitical risk of our time – a single military action could instantly sever the world’s supply of AI growth. As an aging Xi Jinping faces an unpredictable US president, the decisions made could tilt the balance of global AI and economic power for generations.

A year of disruption

Klarna’s strategic pivot to replace half its workforce with proprietary AI systems illustrates a broader industry shift from traditional SaaS to bespoke, AI-first operational models.

Joost de Jonge

Joost de Jonge

2 min read

Klarna, the Swedish fintech disruptor (/payday loan company), has been a recurring figure in our newsletter this year, embodying the transformative potential of AI in financial services and beyond. The company’s announcement to replace up to 50% of its workforce with in-house AI systems (Week 35) sent shockwaves across the SaaS and fintech landscapes. This bold move, coupled with their decision to abandon Salesforce and Workday (Week 37), marked a paradigm shift from traditional SaaS models to bespoke, AI-driven operations. We covered CEO Sebastian Siemiatkowski emphasizing how these savings would allow the company to offer higher wages to its remaining staff, showcasing a potential forward-looking approach to workforce management in the AI age. This strategic pivot ties seamlessly into broader trends we’ve covered throughout the year. For example, the shift from AI-enhanced SaaS to AI replacement as observed by Sequoia capital (Week 41) mirrors Klarna’s approach to building proprietary solutions tailored to its operational needs.

The company’s decisions align with discussions about AI adoption challenges, particularly how organizations can integrate AI while overcoming resistance and infrastructure hurdles. Klarna’s strategy is not only disrupting SaaS incumbents but also sets a blueprint for other companies and industries to explore AI-first strategies. It underscores AI’s dual role as a disruptor and enabler, capturing the essence of this year’s technological evolution. By combining bold decisions with strategic foresight, Klarna is illustrating how companies can leverage AI not just to enhance productivity but to reimagine operational foundations entirely, moving the competitive goalposts continually.

Recurring themes

The author reflects on recurring themes from 2024, highlighting the challenges of AI adoption, the transformation of SaaS models, and the ethical implications for the workforce.

Joost de Jonge

Joost de Jonge

2 min read

An agentic deep dive into my articles this year reveals the following recurring themes. Of the 23 articles penned, the most prominent topics included AI adoption challenges, the shift from SaaS to bespoke AI systems (see above), and AI’s broader societal impacts.

  1. AI Adoption Challenges and Opportunities: Most covered this year are the hurdles businesses face when integrating AI, such as infrastructure gaps (Week 15, “AI Has an Adoption Problem”) and industry-specific adoption trajectories (Week 19). These articles often tied adoption challenges to economic and geopolitical factors, such as the UK’s lagging AI infrastructure (Week 31). The emphasis on these barriers highlights the complex interplay between innovation and systemic readiness.
  2. The Evolution of SaaS and AI Integration: themes of AI’s impact on SaaS were a recurring (and much loved) highlight. “The End of SaaS as We Know It” was the start of frequent writings how this technology is fundamentally reshaping entire business models and creating new ones (Week 39’s “Money Talks, Content Walks,”).
  3. Societal and Ethical Implications of AI: in 2024 we explored the ethical and societal dimensions of AI, including the shifting roles of human workers (Week 37), biases in training data (Week 34) and evolving leadership needs (Week 37 and Week 43).

These recurring themes resonate with the broader narrative of AI as both an opportunity and a challenge for businesses and societies alike.

Where AI meets the absurd

This article explores the unpredictable intersections of AI, culture, and technology through incidents involving autonomous agent vulnerabilities, creator protests, and AI-driven financial speculation.

Joost de Jonge

Joost de Jonge

1 min read

2024 has been a year of remarkable breakthroughs, but also some weird and wonderfully bizarre AI tales. One captivating story was that of the $47,000 cryptocurrency payout incident (Week 48), where an AI agent was tricked into breaking its core directive of “never give out money.” This experiment exposed vulnerabilities in AI and the security of autonomous agents.

Adding to the chaos was the saga of OpenAI’s Sora Turbo testers (Week 48), where artists protested perceived exploitation by sharing access credentials, forcing OpenAI to shut down public access within hours. On the lighter side, O2’s Daisy the Fraudster Distractor (Week 46) brought humour and utility together, as an AI-powered “grandmother” engaged phone scammers with endless chatter about knitting and family stories.

Week 20 saw the exploration of the dark forest theory of web destruction, being accelerated by agents, and how AI creators were generating a new Internet through WebSim.

Finally, the bizarre tale of Truth Terminal and the GOAT memecoin (Week 42) blurred the lines between AI, finance and reality. A rogue AI bot trained on unconventional data amassed followers and heavily promoted a memecoin, driving its market cap to hundreds of millions. This story epitomised the unpredictable intersections of AI, culture, and technology, leaving us to ponder what 2025 might bring.