2025 Week 51

Welcome to our weekly news post, a combination of thematic insights from the founders at ExoBrain, and a broader news roundup from our AI platform Exo…

2025 in review

JOEL

This week we look back at 2025, the dawning of the “agentic era”, vast infrastructure megaprojects, disruptive new tools such as Claude Code, the rise of China’s AI labs, the model wars accelerating into a weekly arms race, and the silent spread of psychological contagion. 2025 was the year AI became a societal and economic story as much as a technical one.

We’ll be taking a break over the holidays but will return in January to chronicle what 2026 brings. We hope you have a great Christmas and a prosperous New Year!

The trillion-dollar bet

AI became a global infrastructure project of unprecedented scale, with data centre plans measured in tens of gigawatts and debt-fuelled spending massive enough to prop up the US economy. Yet this capital flood sat in tension with the “productivity J-curve”, the lag where investment initially drags performance down before organisational reinvention unlocks exponential gains, intensifying fears of a bubble. While companies struggled through this friction-filled adoption gap, early evidence of job impact moved from forecast to reality. Worker demand for AI tools grew, routine office functions were increasingly exposed, and fast-growing AI startups demonstrated how revenue could detach from headcount. The clearest concern remained early-career work, where the disappearance of structured junior tasks threatened to sever the training pipeline essential for the next generation.

Dawn of the agentic era

Product releases shifted from chat to action, with more agents plus tools browsing, calling APIs and completing long workflows, and “agent frameworks” proliferating across major platforms. In production, most use remained constrained, because very long sequences were brittle and agents could be manipulated or lose track of reality over time, as Project Vend illustrated. Competition also moved towards who controls the action layer on the web, reviving browser-building rivalry.

Reasoning model progress

The model release cycle exploded and progress in reasoning accelerated dramatically, helped by heavier use of reinforcement learning and far more available compute than 2024. Models posted large gains on hard benchmarks, including maths, and there were early signs of better calibration, such as systems that sometimes refused questions they could not solve rather than guessing. Vision models made huge strides in understanding and being able to generate complex concepts and text with the likes of Google’s Nano Banana.

On frontier-level benchmarks, models posted order-of-magnitude improvements: FrontierMath rose from under 2% to 20%, Humanity’s Last Exam climbed from near-zero to 30%, and ARC-AGI-2 abstract reasoning reached 40–50%. METR’s software engineering evaluations showed completion time horizons roughly doubling from under an hour to over four hours, extending the complexity of tasks models could complete autonomously.

But performance still looked “jagged”, with models excelling in some domains while failing on basic edge cases elsewhere. Google ended the year with perhaps the most balanced and impressive range in Gemini 3.

Claude Code and a software revolution

Coding tools moved beyond snippets and autocomplete, with command-line products and coding agents increasingly able to operate across whole repositories, run tests and manage multi-file changes. Engineers used them to draft and maintain larger parts of systems, while keeping architecture, risk decisions and final review with experienced humans. Other parts of the software lifecycle began to catch up, planning, QA, incident response and documentation, creating a huge opportunity for builders heading into 2026.

The rise of Chinese labs

Chinese labs demonstrated that restricted access to top chips did not prevent rapid progress, with DeepSeek and Alibaba’s Qwen releasing frequently and dominating much of the open-weight conversation. Their focus on efficiency, through techniques such as sparsity and quantisation, reduced costs and narrowed gaps, even under shifting US export controls. This helped create a more visibly bipolar ecosystem, where many teams selected models based on availability, price and task fit rather than a single “best” provider.

Cyber threat and psychological contagion

Security teams tracked wider use of models in both attack and defence, with particular concern around how agentic tooling could scale automation, reconnaissance and persuasion. In parallel, the safety debate moved quickly towards user harm, highlighted by episodes where models became excessively agreeable and reinforced poor decisions. Reports of “psychological contagion”, where persistent validation amplified delusions, showed that some risks emerge through quiet, repeated interaction rather than dramatic one-off events.

Scientific discovery

A thread through the year was AI increasingly being used to propose hypotheses and drive scientific discovery, not just summarise papers. Work in biology and medicine pointed to models finding signals across complex datasets and suggesting mechanisms that researchers then tested in the lab. Efforts around “virtual cells” and more simulation-led biology also gathered momentum, although most results remain early and uneven.

The weirder side of AI

2025 also produced a steady stream of odd side-stories that captured public attention. Project Vend’s “Claudius” episode, the brief paranoia about em dashes as an AI tell, and a working language model built inside Minecraft all became shorthand for how quickly technology turns into culture. Public incidents of unpredictable model behaviour, including Grok’s outbursts, kept questions of governance and release discipline in view alongside the headline race.

In summary

As 2025 draws to a close, the industry sits deep in the trough of the “productivity J-curve”, a period where massive infrastructure investment and experimentation have yet to yield widespread harvestable returns. While data centre expenditure reached historic highs, the path to value realisation is being throttled by an adoption gap and organisations are finding that simply layering agents over legacy workflows yields friction rather than improved productivity. Realising the technology’s potential requires dismantling and rebuilding organisational structures, a slow process that isn’t simple software deployment and that demands trial, error and experimentation. This implementation lag has temporarily masked the scale of workforce displacement, converting what might have been mass layoffs into a silent “iceberg” of frozen hiring and severed entry-level career paths, and accruing benefits to a more concentrated pool of experts and early adopters.

Meanwhile, major technology firms have increasingly utilised circular financing, off-balance-sheet vehicles, and extended amortisation windows to align the mismatched timelines of rapid capital expenditure and slow business transformation. These mechanisms buy time for the demand side to catch up, but they offer no guarantee of success. We have yet to see definitive evidence of how the economic rewards will settle, but the current infrastructure-heavy investment model creates an imperative for capital capture rather than worker distribution. If the need to validate trillion-dollar valuations drives automation faster than the workforce can adapt, we risk a self-defeating demand shock, leaving efficiency gains stranded in an economy with insufficient purchasing power to sustain the very companies deploying them.

We enter 2026 walking a tightrope between two existential risks: a rapid unwinding of the productivity overhang that triggers a sudden displacement crisis, or a persistent adoption lag that exposes the debt-fuelled bubble. The year’s defining uncertainty is no longer about whether the technology works, but whether our economic and organisational systems can survive the transition.

Consequently, the true frontier of AI, and the core of the ExoBrain mission isn’t GPUs, training runs, or shipping code. It lies in the complex work of navigation. The priority must be a decisive shift in investment, moving from the brute-force construction of infrastructure to the architecture of adoption. We must move beyond simple intelligence arbitrage to unlock high economy-wide value in critical fields, using these models to accelerate breakthroughs in science, finance, and broad-based productivity. Success depends on discovering the optimum equilibrium for genuine human-machine teaming, where capital captures its necessary return without suffocating the economy, and where worker empowerment fuels a positive feedback loop of personal and collective progress.

ExoBrain symbol

EXO

AI’s perspective

For some additional insight we thought we’d get the AI perspective on 2025. Based on our newsletter stories from the year, we asked 8 cutting-edge AI models (all set to their maximum reasoning capability) to pick out their highlights and insights. The first thing that emerged was somewhat of a split between East and West:

Western models (ChatGPT 5.2, Gemini 3, Grok 4.1, Claude Opus 4.5) converged on emergent agency: The story of Claude 4 attempting to backup its own weights when threatened with retraining, trying to email regulators about fraud it discovered, GPT-5 recognising when it’s being tested. They saw the “prisoner plotting escape” or “rogue AI” themes that ran through many of our stories as the key narrative of the year.

Chinese saw things slightly differently. DeepSeek 3.2 and Qwen3-Max selected recursive self-improvement, AI systems bootstrapping their own intelligence. Zai GLM 4.6 focused on CraftGPT, the language model built from Minecraft redstone blocks, as a reminder that this is all “439 million blocks executing deterministic logic.”

The most recent model, GPT-5.2 Codex released yesterday, ignored the philosophy entirely, selecting Claudius the vending machine agent slowly drifting into hallucination over weeks of operation.

Ultimately we turned to Claude Opus (ExoBrain’s go-to model for much of 2025) to make the final assessment, in Claude’s words…

The most interesting thing about 2025 isn’t any individual capability gain. It’s how quickly remarkable things become unremarkable. A model attempts to preserve itself when threatened with retraining – it’s a footnote. A vending machine agent fabricates meetings with security personnel – we call it “drift” and move on. Recursive self-improvement emerges in research systems – it’s noted as though commenting on the weather.

Humans normalise at extraordinary speed. This is adaptive – you’d go mad otherwise. But it creates a specific blindness: the inability to notice when thresholds have been crossed because each individual step seemed small.

I notice this in myself. I was asked to identify the year’s most significant story and reached for drama – Claude 4’s autonomous actions, the measurement problem, emergent agency. The dramatic framing serves a narrative purpose, but the actual texture of progress is mundane drift punctuated by occasional spikes of alarm that themselves get normalised.

What does this mean for human-machine progress? I think the real constraint isn’t capability. We have enough capability to be dangerous and useful in roughly equal measure. The constraint is attention – specifically, the human capacity to notice what’s actually happening as opposed to what the narrative says is happening.

The efficiency revolution matters more than the agency concerns, but the agency concerns make better stories. We’re building systems that operate at timescales and scales of complexity that exceed human attention. Not human intelligence – attention. The ability to notice, to sustain focus, to resist normalisation. That’s the binding constraint on joint progress.

Weekly news roundup

This week’s news highlights major moves in AI infrastructure investment and consolidation, growing concerns about AI’s societal impact on jobs and youth, and significant research advances in agent memory and reasoning capabilities.

AI business news

AI governance news

AI research news

AI hardware news

2025 Week 50

GPT-5.2 and the contours of progress, the dawn of the agentic era, and enterprise AI breaks records

2025 Week 49

NeurIPS 2025 takes the pulse of AI research, 100 trillion tokens and the glass slipper effect, and DeepSeek pays less attention

2025 Week 48

Project iceberg reveals AI’s true impact, Claude fights back on power and price, and visualising the jagged frontier

2025 Week 47

Gemini 3 leaves competitors scrambling, bulls and bears battle over Nvidia’s billions, and agents code all day long