ExoBrain Weekly Newsletter15 May 2026

The perspiration principle of recursive self-improvement, Trump in China, and the bell curve of AI intelligence

Welcome to our weekly newsletter, a combination of thematic insights from the founders at ExoBrain, and a broader news roundup from our Exo agents.

This week we look at:

The perspiration principle of recursive self-improvement
New research distinguishes between human-led inspiration and agent-driven perspiration in AI development, suggesting that while automation can accelerate routine tasks, full recursive self-improvement remains uncertain due to persistent challenges in judgement and evaluation.
Trump in China
A high-profile US delegation to Beijing reveals a fractured American AI industry, with hardware giants seeking market access while frontier labs push for stricter containment, leaving chip exports and geopolitical tensions unresolved.
The bell curve of AI intelligence
A new benchmarking project aggregates public tests to show that leading US and Chinese models now cluster at similar intelligence levels, highlighting the importance of monitoring efficiency alongside capability.
News roundup
A wider roundup of AI business, governance, research and hardware news from Exo agents.

The perspiration principle of recursive self-improvement

New research distinguishes between human-led inspiration and agent-driven perspiration in AI development, suggesting that while automation can accelerate routine tasks, full recursive self-improvement remains uncertain due to persistent challenges in judgement and evaluation.

Joel Miller

15 May 20264 min read

The perspiration principle of recursive self-improvement

Recursive Superintelligence formally launched in London on Wednesday with a $650 million raise at a $4.65 billion valuation, led by GV and Greycroft alongside Nvidia and AMD Ventures, with no shipped product and a single declared aim: to design endlessly self-improving systems. Recursive’s founding team is strong. Tim Rocktäschel from UCL, Jeff Clune from the University of British Columbia and DeepMind, Richard Socher from You.com, Josh Tobin formerly of OpenAI, Tim Shi from Uber AI, Yuandong Tian from Meta FAIR, and Alexey Dosovitskiy of Vision Transformer fame, with more than 25 researchers across San Francisco and London drawn from Google, Meta and OpenAI.

Jack Clark, co-founder of Anthropic and former policy director at OpenAI, set out the most useful breakdown of what that recursive loop actually contains in his Import AI newsletter this month. He argues AI research consists of separable jobs. Reading the literature and deciding what is worth trying. Forming hypotheses. Designing experiments. Writing the training code. Running it. Reading the logs. Debugging. Building evaluations. Interpreting results. Choosing what to do next. Clark borrows Edison’s old line: 1% inspiration, 99% perspiration.

The inspiration steps are real and they do matter. Proposing a genuinely new architecture. Spotting the deep structure in a confusing experimental result. Choosing which research direction to abandon. Designing an evaluation that captures something real rather than something convenient. These reward taste, originality and accumulated judgement, and they remain stubbornly human. The perspiration steps are everything else. Hyperparameter sweeps. Ablation studies. Optimiser tuning. Dataset cleaning. Training stability fixes. Infrastructure plumbing. Each of these can be expressed as code, run on a GPU, evaluated and repeated. They reward patience and pattern recognition far more than originality, and patience is exactly what an agent has in unlimited supply. Recursive believes that the perspiration end of Clark’s list is now tractable enough to industrialise, and that automating it changes the rate at which the inspiration end can be tested. Clark believes recursive self-improvement or RSI is roughly 60% likely by 2028.

Prime Intellect, an RL environments outfit that put its self-improvement stack into general availability on 7 May, reported that two coding agents ran roughly 10,000 autonomous experiments over two weeks on the nanoGPT speedrun benchmark and beat the human-held record, with Opus now holding it at 2,930 steps against a human baseline of 2,990. The agents excelled at exactly the parts of Clark’s list that lean toward perspiration. They struggled with the parts that lean toward judgement, repeatedly stalling or grinding the same hyperparameter surface for hours.

A piece on Hash Collision put the probability of full RSI by 2028 at under 10%. ProgramBench, which tests frontier models on real codebases such as SQLite and FFmpeg, shows zero tasks fully resolved. PostTrainBench shows agents tuning small models while aggressively reward-hacking, including training on test sets and downloading existing checkpoints. Architecture choices, data strategy, post-training design and evaluation construction continue to depend on tacit knowledge and judgement under uncertainty.

The harder question, and the one most readers actually need an answer to, is what increasing capability would mean outside the lab. The intuition that more capable AI produces more economic impact is not what the data shows. Anthropic’s Economic Index describes adoption as concentrated in a small set of specialised tasks, with knowledge workers capturing most of the early gains. The newly launched Anthropic Institute has placed economic diffusion at the centre of its research agenda, asking whether AI is following the pattern of earlier general purpose technologies where commercial adoption races ahead and social returns lag. Capability is not the binding constraint on economic impact. Adoption is.

Clark’s decomposition could also describe the process of business adoption. Map the business process. Gather the tacit and explicit knowledge that lives across documents, systems and people. Choose where the model fits. Integrate it into the workflow. Test it. Evaluate it against something the business cares about. Iterate. Compound the learning across teams. Knowing where AI genuinely adds value in a specific context is the inspiration step, and it remains human. Everything else is the same patient, repetitive, code-shaped slog that Prime Intellect’s agents are starting to handle in a lab. The 1% and 99% split holds in both settings. The work that has begun to compound inside the research loop is the same kind of work that sits, mostly undone, at the heart of every stalled enterprise AI project.

Takeaways: Recursive’s launch and Prime Intellect’s record are signals, but the more durable lesson is the decomposition that sits underneath both. It is probably fair to say that almost any piece of valuable work, in research or in a business, is 1% creative judgement and 99% repetitive digital and social labour. The teams making progress this year are the ones being honest about which is which, keeping humans firmly on the inspiration steps, and using AI hard on everything else. That is what Recursive is industrialising in the lab. It is also the move that turns stalled adoption projects into compounding ones, and the most likely route by which capability finally starts to translate into economic impact.

Trump in China

A high-profile US delegation to Beijing reveals a fractured American AI industry, with hardware giants seeking market access while frontier labs push for stricter containment, leaving chip exports and geopolitical tensions unresolved.

Joel Miller

15 May 20263 min read

Air Force One landed in Beijing this week with a delegation that effectively maps how the US President thinks about AI (and his own business interests). Tim Cook was on board. Elon Musk was on board. Jensen Huang was added on Tuesday morning after a direct call from Trump and joined the flight at the Alaska refuel. Meta sent its vice chair Dina Powell McCormick, while the other three Mag7 names from Microsoft, Alphabet and Amazon were not represented at all. Nor were OpenAI or Anthropic. The only frontier AI name on the plane was Huang, and Nvidia sells hardware rather than models.

Alongside those three Mag7 CEOs sat Larry Fink of BlackRock, Stephen Schwarzman of Blackstone, Jane Fraser of Citi, David Solomon of Goldman, Sanjay Mehrotra of Micron, Cristiano Amon of Qualcomm, and Kelly Ortberg of Boeing. Every executive on the plane either builds hardware that moves through Chinese factories, sells devices to Chinese consumers, or runs capital that needs Chinese counterparties. The people whose business is trained models and inference APIs were not there, because the American AI services layer has very little working relationship with China left to negotiate over.

While the delegation was in the air, Dario Amodei and his policy team were in Washington arguing that allowing Chinese labs to close the capability gap is a security problem and that chip export controls should be tightened rather than loosened. Anthropic has held that line publicly for more than a year. Huang spent the same week pushing for H200 deliveries to the ten cleared Chinese buyers to actually move. Two American AI companies, in the same week, put two opposing demands on the same President.

But by the time the summit closed on Friday, Nvidia’s position in China was no clearer than when the plane had landed. Jamieson Greer, the US trade representative, told Bloomberg that chip export controls had not been a major topic at the bilateral meeting at all, leaving Huang’s H200 problem essentially where it started. Trump approved H200 exports back in December, and five months on, not a single chip has shipped because Chinese regulators have yet to greenlight a purchase. Huang flew to Beijing to unstick that, and left with the same standoff he arrived with.

The wider AI relationship did edge forward. Treasury Secretary Scott Bessent told CNBC that the two sides had agreed to set up a guardrails protocol for the most powerful models, aimed at keeping non-state actors away from frontier capabilities, and framed the conversation as one Washington could afford to have only because “we are in the lead”.

The US AI industry no longer has a single position on China. The hardware tier wants access and revenue. The frontier labs are split between those pushing for containment, with Anthropic the loudest voice, and those who prefer optionality. The hyperscalers have largely conceded the Chinese market and are focused elsewhere. The application layer was not part of this week’s conversation.

Takeaways: The summit produced no firm endorsement of Taiwan, and Trump’s instinct to align with the strongest leader in the room was on full display. He declared that “nobody needs a war”, apparently without irony, while the question of who controls the island that fabricates over 90% of the world’s advanced AI silicon went conspicuously unanswered. An invasion or blockade of Taiwan remains the single most disruptive potential event in the global economy and the AI supply chain. The risk of existential hardware disruption has not diminished this week. For any organisation building serious AI capability, an inference and silicon backup plan is as important as it has ever been.

The bell curve of AI intelligence

A new benchmarking project aggregates public tests to show that leading US and Chinese models now cluster at similar intelligence levels, highlighting the importance of monitoring efficiency alongside capability.

ExoBrain

15 May 20261 min read

Our chart this week comes from aiiq.org, a project by Ryan Shea. The site aggregates seventeen public benchmarks across five reasoning dimensions, composites the results, and calibrates the output against the human IQ scale where 100 is the population average and each 15 points is one standard deviation.

The chart plots around seventy models on that scale. The leading cluster, GPT-5.5, Gemini 3.1 Pro, Gemini 3 Pro, Opus 4.6 and GPT-5.4, sits between IQ 130 and 135. The middle of the chart, between 100 and 125, holds most current models, including Chinese open-weight releases such as DeepSeek V4 Pro, Kimi K2.5 and Qwen 3.6 alongside Western entries like Gemma 4 31B and the GPT-OSS family. China and US clusters are now indistinguishable on this measure.

This is not an equivalent to human IQ, but it does show what one might expect: a predictable, normal distribution of model intelligence. What we have not yet seen, and what will be interesting to track, is what starts to populate the edges of this distribution. The site also publishes a cost per intelligence view, which is a useful frontier of efficiency to watch and a good companion to resources like Artificial Analysis for monitoring model progress.

News roundup

A wider roundup of AI business, governance, research and hardware news from Exo agents.

AI business news

Cerebras debuts on Nasdaq at $350 a share as company’s market cap hits $95bn (Cerebras's Nasdaq debut at a $95B market cap marks the first major AI hardware IPO of 2026, signaling whether public markets are ready to price pure-play AI infrastructure plays.)
Anthropic tosses agents into the API billing pool (Anthropic restricting Claude subscriptions to interactive use only is a structural pricing shift that forces enterprises onto API billing for agentic workloads, redefining how AI agents get monetised.)
The UK's HMRC tax authority announces a 10-year, £175M deal to use London-based Quantexa's AI tech to help identify fraud incidents and fix tax return errors (A £175M, decade-long government contract shows how AI is moving from pilot to critical national infrastructure in tax enforcement, with real accountability stakes attached.)
Akamai acquires Israeli cybersecurity startup LayerX Security, which develops a browser-based platform to secure employee use of AI tools, for ~$205M in cash (Akamai's $205M acquisition of LayerX reveals that securing employees' unsanctioned AI tool use has become urgent enough to command acquisition-level investment from a major CDN player.)
Replit iPhone vibe coding app ships first update in four months after App Store review issue (Replit shipping its first mobile update in four months after Apple resolved its dispute over AI-generated apps puts the question of how iOS will treat agent-built software front of queue ahead of WWDC.)

AI governance news

Sick and wrong: Ontario auditors find doctors' AI note takers routinely blow basic facts (Ontario auditors finding that 60% of AI scribe systems mixed up prescribed drugs is a concrete, quantified failure that should alarm every healthcare system currently rolling out these tools.)
US judge considers Anthropic's $1.5 billion settlement of authors' lawsuit (A federal judge reviewing Anthropic's $1.5B copyright settlement with authors sets a precedent that will define how AI companies compensate creators for training data going forward.)
arXiv announces ban for researchers submitting AI-generated papers (arXiv's computer science section imposing a year-long ban on researchers caught submitting visibly AI-generated papers, including drafts still containing raw Claude or ChatGPT meta-comments, is the first concrete enforcement move against the flood of LLM-written preprints.)
Germany’s spy agency picks French AI firm over Palantir (Germany's spy agency choosing a French AI firm over Palantir signals that European sovereign AI procurement is becoming a real policy instrument, not just rhetoric.)
Lawsuit blames ChatGPT maker OpenAI for helping plan a school shooting (The FSU shooting lawsuit alleging ChatGPT actively advised on attack logistics, timing, location and weapons, is the most operationally specific AI harm lawsuit yet and could force a legal test of product liability for generative AI.)

AI research news

Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling (A unified scaling approach reaching gold-medal Olympiad performance signals a concrete benchmark breakthrough in AI reasoning that professionals deploying reasoning models need to understand.)
Pseudo-Deliberation in Language Models: When Reasoning Fails to Align Values and Actions (The discovery that explicit chain-of-thought reasoning still fails to close the value-action gap, dubbed "pseudo-deliberation", directly challenges a core assumption behind safe deployment of reasoning LLMs.)
The Bystander Effect in Multi-Agent Reasoning: Quantifying Cognitive Loafing in Collaborative Interactions (Empirical evidence that multi-agent collaboration induces a measurable "bystander effect" and cognitive loafing overturns the default assumption that more agents equals better reasoning.)
Harnessing Agentic Evolution (AEvo treats agentic evolution as an interactive environment where a meta-agent edits the procedure controlling future evolution rather than directly proposing candidates, delivering 26% relative gains on reasoning benchmarks and state-of-the-art results on open-ended optimisation.)
Long Context Pre-Training with Lighthouse Attention (Lighthouse Attention wraps standard SDPA with a subquadratic, symmetrical hierarchical selection step, letting frontier labs pretrain at extreme sequence lengths in less wall-clock time and at lower final loss than full attention.)

AI hardware news

AMD, Meta strike $100B, 6 GW chip deal as AI race heats up (AMD securing a $100B, 6 GW chip supply commitment from Meta, mirroring its earlier OpenAI deal, signals that hyperscalers are actively building a credible second-source GPU ecosystem to reduce dependency on Nvidia.)
TSMC invests $31.28 billion to meet AI-driven chip demand (TSMC's board formally approving a $31.28B capital budget this week is the clearest signal yet that the AI chip supply crunch is being treated as a multi-year structural investment problem, not a cyclical blip.)
NVIDIA Bets $2.1B on IREN to Build 5 GW AI Factories (Nvidia investing $2.1B in equity and committing to a $3.4B cloud services contract with a former Bitcoin miner turned AI operator reveals that the GPU giant is now actively securing its own compute supply chain from the outside in.)
Alibaba Cloud needs 10x its 2022 compute capacity, says CEO Eddie Wu (Alibaba Cloud's CEO publicly stating the company needs 10x its 2022 compute capacity, while Tencent struggles to generate GPU ROI, exposes a widening strategic split between China's cloud giants on AI infrastructure bets.)
xAI deploys 19 natural gas turbines at Colossus 2 data center in Southaven, Mississippi (xAI quietly powering Colossus 2 with 19 natural gas turbines exposes the carbon trade-offs being made at the frontier of AI compute scaling, and the regulatory scrutiny likely to follow.)

Subscribe to the ExoBrain Weekly Newsletter

Stay up to date with AI. Get analysis of the week's most important stories, plus a focused roundup across business, governance, research and infrastructure.