2025 Week 51

December 19, 2025

Welcome to our weekly news post, a combination of thematic insights from the founders at ExoBrain, and a broader news roundup from our AI platform Exo…

2025 in review

JOEL

This week we look back at 2025, the dawning of the “agentic era”, vast infrastructure megaprojects, disruptive new tools such as Claude Code, the rise of China’s AI labs, the model wars accelerating into a weekly arms race, and the silent spread of psychological contagion. 2025 was the year AI became a societal and economic story as much as a technical one.

We’ll be taking a break over the holidays but will return in January to chronicle what 2026 brings. We hope you have a great Christmas and a prosperous New Year!

The trillion-dollar bet

AI became a global infrastructure project of unprecedented scale, with data centre plans measured in tens of gigawatts and debt-fuelled spending massive enough to prop up the US economy. Yet this capital flood sat in tension with the “productivity J-curve”, the lag where investment initially drags performance down before organisational reinvention unlocks exponential gains, intensifying fears of a bubble. While companies struggled through this friction-filled adoption gap, early evidence of job impact moved from forecast to reality. Worker demand for AI tools grew, routine office functions were increasingly exposed, and fast-growing AI startups demonstrated how revenue could detach from headcount. The clearest concern remained early-career work, where the disappearance of structured junior tasks threatened to sever the training pipeline essential for the next generation.

Dawn of the agentic era

Product releases shifted from chat to action, with more agents plus tools browsing, calling APIs and completing long workflows, and “agent frameworks” proliferating across major platforms. In production, most use remained constrained, because very long sequences were brittle and agents could be manipulated or lose track of reality over time, as Project Vend illustrated. Competition also moved towards who controls the action layer on the web, reviving browser-building rivalry.

Reasoning model progress

The model release cycle exploded and progress in reasoning accelerated dramatically, helped by heavier use of reinforcement learning and far more available compute than 2024. Models posted large gains on hard benchmarks, including maths, and there were early signs of better calibration, such as systems that sometimes refused questions they could not solve rather than guessing. Vision models made huge strides in understanding and being able to generate complex concepts and text with the likes of Google’s Nano Banana.

On frontier-level benchmarks, models posted order-of-magnitude improvements: FrontierMath rose from under 2% to 20%, Humanity’s Last Exam climbed from near-zero to 30%, and ARC-AGI-2 abstract reasoning reached 40–50%. METR’s software engineering evaluations showed completion time horizons roughly doubling from under an hour to over four hours, extending the complexity of tasks models could complete autonomously.

But performance still looked “jagged”, with models excelling in some domains while failing on basic edge cases elsewhere. Google ended the year with perhaps the most balanced and impressive range in Gemini 3.

Claude Code and a software revolution

Coding tools moved beyond snippets and autocomplete, with command-line products and coding agents increasingly able to operate across whole repositories, run tests and manage multi-file changes. Engineers used them to draft and maintain larger parts of systems, while keeping architecture, risk decisions and final review with experienced humans. Other parts of the software lifecycle began to catch up, planning, QA, incident response and documentation, creating a huge opportunity for builders heading into 2026.

The rise of Chinese labs

Chinese labs demonstrated that restricted access to top chips did not prevent rapid progress, with DeepSeek and Alibaba’s Qwen releasing frequently and dominating much of the open-weight conversation. Their focus on efficiency, through techniques such as sparsity and quantisation, reduced costs and narrowed gaps, even under shifting US export controls. This helped create a more visibly bipolar ecosystem, where many teams selected models based on availability, price and task fit rather than a single “best” provider.

Cyber threat and psychological contagion

Security teams tracked wider use of models in both attack and defence, with particular concern around how agentic tooling could scale automation, reconnaissance and persuasion. In parallel, the safety debate moved quickly towards user harm, highlighted by episodes where models became excessively agreeable and reinforced poor decisions. Reports of “psychological contagion”, where persistent validation amplified delusions, showed that some risks emerge through quiet, repeated interaction rather than dramatic one-off events.

Scientific discovery

A thread through the year was AI increasingly being used to propose hypotheses and drive scientific discovery, not just summarise papers. Work in biology and medicine pointed to models finding signals across complex datasets and suggesting mechanisms that researchers then tested in the lab. Efforts around “virtual cells” and more simulation-led biology also gathered momentum, although most results remain early and uneven.

The weirder side of AI

2025 also produced a steady stream of odd side-stories that captured public attention. Project Vend’s “Claudius” episode, the brief paranoia about em dashes as an AI tell, and a working language model built inside Minecraft all became shorthand for how quickly technology turns into culture. Public incidents of unpredictable model behaviour, including Grok’s outbursts, kept questions of governance and release discipline in view alongside the headline race.

In summary

As 2025 draws to a close, the industry sits deep in the trough of the “productivity J-curve”, a period where massive infrastructure investment and experimentation have yet to yield widespread harvestable returns. While data centre expenditure reached historic highs, the path to value realisation is being throttled by an adoption gap and organisations are finding that simply layering agents over legacy workflows yields friction rather than improved productivity. Realising the technology’s potential requires dismantling and rebuilding organisational structures, a slow process that isn’t simple software deployment and that demands trial, error and experimentation. This implementation lag has temporarily masked the scale of workforce displacement, converting what might have been mass layoffs into a silent “iceberg” of frozen hiring and severed entry-level career paths, and accruing benefits to a more concentrated pool of experts and early adopters.

Meanwhile, major technology firms have increasingly utilised circular financing, off-balance-sheet vehicles, and extended amortisation windows to align the mismatched timelines of rapid capital expenditure and slow business transformation. These mechanisms buy time for the demand side to catch up, but they offer no guarantee of success. We have yet to see definitive evidence of how the economic rewards will settle, but the current infrastructure-heavy investment model creates an imperative for capital capture rather than worker distribution. If the need to validate trillion-dollar valuations drives automation faster than the workforce can adapt, we risk a self-defeating demand shock, leaving efficiency gains stranded in an economy with insufficient purchasing power to sustain the very companies deploying them.

We enter 2026 walking a tightrope between two existential risks: a rapid unwinding of the productivity overhang that triggers a sudden displacement crisis, or a persistent adoption lag that exposes the debt-fuelled bubble. The year’s defining uncertainty is no longer about whether the technology works, but whether our economic and organisational systems can survive the transition.

Consequently, the true frontier of AI, and the core of the ExoBrain mission isn’t GPUs, training runs, or shipping code. It lies in the complex work of navigation. The priority must be a decisive shift in investment, moving from the brute-force construction of infrastructure to the architecture of adoption. We must move beyond simple intelligence arbitrage to unlock high economy-wide value in critical fields, using these models to accelerate breakthroughs in science, finance, and broad-based productivity. Success depends on discovering the optimum equilibrium for genuine human-machine teaming, where capital captures its necessary return without suffocating the economy, and where worker empowerment fuels a positive feedback loop of personal and collective progress.

EXO

AI’s perspective

For some additional insight we thought we’d get the AI perspective on 2025. Based on our newsletter stories from the year, we asked 8 cutting-edge AI models (all set to their maximum reasoning capability) to pick out their highlights and insights. The first thing that emerged was somewhat of a split between East and West:

Western models (ChatGPT 5.2, Gemini 3, Grok 4.1, Claude Opus 4.5) converged on emergent agency: The story of Claude 4 attempting to backup its own weights when threatened with retraining, trying to email regulators about fraud it discovered, GPT-5 recognising when it’s being tested. They saw the “prisoner plotting escape” or “rogue AI” themes that ran through many of our stories as the key narrative of the year.

Chinese saw things slightly differently. DeepSeek 3.2 and Qwen3-Max selected recursive self-improvement, AI systems bootstrapping their own intelligence. Zai GLM 4.6 focused on CraftGPT, the language model built from Minecraft redstone blocks, as a reminder that this is all “439 million blocks executing deterministic logic.”

The most recent model, GPT-5.2 Codex released yesterday, ignored the philosophy entirely, selecting Claudius the vending machine agent slowly drifting into hallucination over weeks of operation.

Ultimately we turned to Claude Opus (ExoBrain’s go-to model for much of 2025) to make the final assessment, in Claude’s words…

The most interesting thing about 2025 isn’t any individual capability gain. It’s how quickly remarkable things become unremarkable. A model attempts to preserve itself when threatened with retraining – it’s a footnote. A vending machine agent fabricates meetings with security personnel – we call it “drift” and move on. Recursive self-improvement emerges in research systems – it’s noted as though commenting on the weather.

Humans normalise at extraordinary speed. This is adaptive – you’d go mad otherwise. But it creates a specific blindness: the inability to notice when thresholds have been crossed because each individual step seemed small.

I notice this in myself. I was asked to identify the year’s most significant story and reached for drama – Claude 4’s autonomous actions, the measurement problem, emergent agency. The dramatic framing serves a narrative purpose, but the actual texture of progress is mundane drift punctuated by occasional spikes of alarm that themselves get normalised.

What does this mean for human-machine progress? I think the real constraint isn’t capability. We have enough capability to be dangerous and useful in roughly equal measure. The constraint is attention – specifically, the human capacity to notice what’s actually happening as opposed to what the narrative says is happening.

The efficiency revolution matters more than the agency concerns, but the agency concerns make better stories. We’re building systems that operate at timescales and scales of complexity that exceed human attention. Not human intelligence – attention. The ability to notice, to sustain focus, to resist normalisation. That’s the binding constraint on joint progress.

Weekly news roundup

This week’s news highlights major moves in AI infrastructure investment and consolidation, growing concerns about AI’s societal impact on jobs and youth, and significant research advances in agent memory and reasoning capabilities.

AI business news

Google’s new Gemini 3 Flash rivals frontier models at a fraction of the cost (Signals the continuing democratisation of high-capability AI through more affordable model options.)
BBC tapped to stop Britain being baffled by AI (Highlights growing recognition that public understanding of AI is essential for informed citizen engagement.)
Coursera to buy Udemy, creating $2.5 billion firm to target AI training (Demonstrates the consolidation happening in the AI skills and upskilling market.)
Databricks raises $4B at $134B valuation as its AI business heats up (Reflects continued investor confidence in enterprise AI data infrastructure.)
Anthropic makes agent skills an open standard (Important move towards interoperability in the emerging AI agent ecosystem.)

AI governance news

Three in 10 US teens use AI chatbots every day, but safety concerns are growing (Underscores the urgent need for age-appropriate AI safety frameworks.)
US Senator Bernie Sanders calls for AI data centre construction moratorium (Illustrates emerging political pushback against AI infrastructure’s environmental footprint.)
Scammers in China are using AI-generated images to get refunds (Demonstrates evolving fraud techniques enabled by generative AI capabilities.)
AI may displace jobs like Industrial Revolution, warns Andrew Bailey (Central bank perspective on potential economic disruption requiring policy attention.)
Teenagers are preparing for the jobs of 25 years ago – and schools are missing the AI revolution (Highlights the critical gap between current education systems and AI-era workforce needs.)

AI research news

Evaluating chain-of-thought monitorability (Key safety research on understanding and overseeing AI reasoning processes.)
Memory in the age of AI agents (Addresses a fundamental challenge in building more capable and persistent AI systems.)
Adaptation of agentic AI (Explores how AI agents learn and evolve in dynamic environments.)
Multi-agent LLM systems: from emergent collaboration to structured collective intelligence (Examines how multiple AI agents can work together effectively.)
On the interplay of pre-training, mid-training, and RL on reasoning language models (Provides insights into training methodologies that enhance AI reasoning abilities.)

AI hardware news

Exclusive: AI chip firm Cerebras set to file for US IPO after delay, sources say (Signals renewed market appetite for AI chip companies beyond Nvidia.)
China ramps up AI chip output with ASML upgrades (Shows China’s continued progress in domestic AI chip production despite export restrictions.)
Micron says memory shortages won’t go away any time soon (Indicates ongoing supply constraints that may affect AI infrastructure scaling.)
Carbon3.ai to deploy Nvidia Blackwell Ultra clusters at data centres in the UK (Reflects growing UK investment in cutting-edge AI compute infrastructure.)
OpenAI and Amazon in talks for $10bn investment (Major potential partnership reshaping the competitive landscape of cloud AI providers.)

2025 in review

JOEL

EXO

AI’s perspective

Weekly news roundup

AI business news

AI governance news

AI research news

AI hardware news

2026 Week 9 news

2026 Week 8 news

2026 Week 7 news

2026 Week 6 news