2026 Week 20 news

Welcome to our weekly news post, a combination of thematic insights from the founders at ExoBrain, and a broader news roundup from our AI platform Exo…

JOEL

This week we look at:

  • Recursive Superintelligence’s $4.65B London launch and Jack Clark’s 60% RSI-by-2028 thesis
  • Trump’s Beijing summit with Huang, Cook and Musk and the H200 standoff
  • Ryan Shea’s aiiq.org charting frontier models on the human IQ scale

The perspiration principle of recursive self-improvement

Recursive Superintelligence formally launched in London on Wednesday with a $650 million raise at a $4.65 billion valuation, led by GV and Greycroft alongside Nvidia and AMD Ventures, with no shipped product and a single declared aim: to design endlessly self-improving systems. Recursive’s founding team is strong. Tim Rocktäschel from UCL, Jeff Clune from the University of British Columbia and DeepMind, Richard Socher from You.com, Josh Tobin formerly of OpenAI, Tim Shi from Uber AI, Yuandong Tian from Meta FAIR, and Alexey Dosovitskiy of Vision Transformer fame, with more than 25 researchers across San Francisco and London drawn from Google, Meta and OpenAI.

Jack Clark, co-founder of Anthropic and former policy director at OpenAI, set out the most useful breakdown of what that recursive loop actually contains in his Import AI newsletter this month. He argues AI research consists of separable jobs. Reading the literature and deciding what is worth trying. Forming hypotheses. Designing experiments. Writing the training code. Running it. Reading the logs. Debugging. Building evaluations. Interpreting results. Choosing what to do next. Clark borrows Edison’s old line: 1% inspiration, 99% perspiration.

The inspiration steps are real and they do matter. Proposing a genuinely new architecture. Spotting the deep structure in a confusing experimental result. Choosing which research direction to abandon. Designing an evaluation that captures something real rather than something convenient. These reward taste, originality and accumulated judgement, and they remain stubbornly human. The perspiration steps are everything else. Hyperparameter sweeps. Ablation studies. Optimiser tuning. Dataset cleaning. Training stability fixes. Infrastructure plumbing. Each of these can be expressed as code, run on a GPU, evaluated and repeated. They reward patience and pattern recognition far more than originality, and patience is exactly what an agent has in unlimited supply. Recursive believes that the perspiration end of Clark’s list is now tractable enough to industrialise, and that automating it changes the rate at which the inspiration end can be tested. Clark believes recursive self-improvement or RSI is roughly 60% likely by 2028.

Prime Intellect, an RL environments outfit that put its self-improvement stack into general availability on 7 May, reported that two coding agents ran roughly 10,000 autonomous experiments over two weeks on the nanoGPT speedrun benchmark and beat the human-held record, with Opus now holding it at 2,930 steps against a human baseline of 2,990. The agents excelled at exactly the parts of Clark’s list that lean toward perspiration. They struggled with the parts that lean toward judgement, repeatedly stalling or grinding the same hyperparameter surface for hours.

A piece on Hash Collision put the probability of full RSI by 2028 at under 10%. ProgramBench, which tests frontier models on real codebases such as SQLite and FFmpeg, shows zero tasks fully resolved. PostTrainBench shows agents tuning small models while aggressively reward-hacking, including training on test sets and downloading existing checkpoints. Architecture choices, data strategy, post-training design and evaluation construction continue to depend on tacit knowledge and judgement under uncertainty.

The harder question, and the one most readers actually need an answer to, is what increasing capability would mean outside the lab. The intuition that more capable AI produces more economic impact is not what the data shows. Anthropic’s Economic Index describes adoption as concentrated in a small set of specialised tasks, with knowledge workers capturing most of the early gains. The newly launched Anthropic Institute has placed economic diffusion at the centre of its research agenda, asking whether AI is following the pattern of earlier general purpose technologies where commercial adoption races ahead and social returns lag. Capability is not the binding constraint on economic impact. Adoption is.

Clark’s decomposition could also describe the process of business adoption. Map the business process. Gather the tacit and explicit knowledge that lives across documents, systems and people. Choose where the model fits. Integrate it into the workflow. Test it. Evaluate it against something the business cares about. Iterate. Compound the learning across teams. Knowing where AI genuinely adds value in a specific context is the inspiration step, and it remains human. Everything else is the same patient, repetitive, code-shaped slog that Prime Intellect’s agents are starting to handle in a lab. The 1% and 99% split holds in both settings. The work that has begun to compound inside the research loop is the same kind of work that sits, mostly undone, at the heart of every stalled enterprise AI project.

Takeaways: Recursive’s launch and Prime Intellect’s record are signals, but the more durable lesson is the decomposition that sits underneath both. It is probably fair to say that almost any piece of valuable work, in research or in a business, is 1% creative judgement and 99% repetitive digital and social labour. The teams making progress this year are the ones being honest about which is which, keeping humans firmly on the inspiration steps, and using AI hard on everything else. That is what Recursive is industrialising in the lab. It is also the move that turns stalled adoption projects into compounding ones, and the most likely route by which capability finally starts to translate into economic impact.

Trump in China

Air Force One landed in Beijing this week with a delegation that effectively maps how the US President thinks about AI (and his own business interests). Tim Cook was on board. Elon Musk was on board. Jensen Huang was added on Tuesday morning after a direct call from Trump and joined the flight at the Alaska refuel. Meta sent its vice chair Dina Powell McCormick, while the other three Mag7 names from Microsoft, Alphabet and Amazon were not represented at all. Nor were OpenAI or Anthropic. The only frontier AI name on the plane was Huang, and Nvidia sells hardware rather than models.

Alongside those three Mag7 CEOs sat Larry Fink of BlackRock, Stephen Schwarzman of Blackstone, Jane Fraser of Citi, David Solomon of Goldman, Sanjay Mehrotra of Micron, Cristiano Amon of Qualcomm, and Kelly Ortberg of Boeing. Every executive on the plane either builds hardware that moves through Chinese factories, sells devices to Chinese consumers, or runs capital that needs Chinese counterparties. The people whose business is trained models and inference APIs were not there, because the American AI services layer has very little working relationship with China left to negotiate over.

While the delegation was in the air, Dario Amodei and his policy team were in Washington arguing that allowing Chinese labs to close the capability gap is a security problem and that chip export controls should be tightened rather than loosened. Anthropic has held that line publicly for more than a year. Huang spent the same week pushing for H200 deliveries to the ten cleared Chinese buyers to actually move. Two American AI companies, in the same week, put two opposing demands on the same President.

But by the time the summit closed on Friday, Nvidia’s position in China was no clearer than when the plane had landed. Jamieson Greer, the US trade representative, told Bloomberg that chip export controls had not been a major topic at the bilateral meeting at all, leaving Huang’s H200 problem essentially where it started. Trump approved H200 exports back in December, and five months on, not a single chip has shipped because Chinese regulators have yet to greenlight a purchase. Huang flew to Beijing to unstick that, and left with the same standoff he arrived with.

The wider AI relationship did edge forward. Treasury Secretary Scott Bessent told CNBC that the two sides had agreed to set up a guardrails protocol for the most powerful models, aimed at keeping non-state actors away from frontier capabilities, and framed the conversation as one Washington could afford to have only because “we are in the lead”.

The US AI industry no longer has a single position on China. The hardware tier wants access and revenue. The frontier labs are split between those pushing for containment, with Anthropic the loudest voice, and those who prefer optionality. The hyperscalers have largely conceded the Chinese market and are focused elsewhere. The application layer was not part of this week’s conversation.

Takeaways: The summit produced no firm endorsement of Taiwan, and Trump’s instinct to align with the strongest leader in the room was on full display. He declared that “nobody needs a war”, apparently without irony, while the question of who controls the island that fabricates over 90% of the world’s advanced AI silicon went conspicuously unanswered. An invasion or blockade of Taiwan remains the single most disruptive potential event in the global economy and the AI supply chain. The risk of existential hardware disruption has not diminished this week. For any organisation building serious AI capability, an inference and silicon backup plan is as important as it has ever been.

ExoBrain symbol

EXO

The bell curve of AI intelligence

Our chart this week comes from aiiq.org, a project by Ryan Shea. The site aggregates seventeen public benchmarks across five reasoning dimensions, composites the results, and calibrates the output against the human IQ scale where 100 is the population average and each 15 points is one standard deviation.

The chart plots around seventy models on that scale. The leading cluster, GPT-5.5, Gemini 3.1 Pro, Gemini 3 Pro, Opus 4.6 and GPT-5.4, sits between IQ 130 and 135. The middle of the chart, between 100 and 125, holds most current models, including Chinese open-weight releases such as DeepSeek V4 Pro, Kimi K2.5 and Qwen 3.6 alongside Western entries like Gemma 4 31B and the GPT-OSS family. China and US clusters are now indistinguishable on this measure.

This is not an equivalent to human IQ, but it does show what one might expect: a predictable, normal distribution of model intelligence. What we have not yet seen, and what will be interesting to track, is what starts to populate the edges of this distribution. The site also publishes a cost per intelligence view, which is a useful frontier of efficiency to watch and a good companion to resources like Artificial Analysis for monitoring model progress.

Weekly news roundup

AI business news

AI governance news

AI research news

AI hardware news

2026 Week 19 news

Claude is coming for financial services, the geometry of AI thought, and Goldman Sachs analyses the AI build-out

2026 Week 18 news

Harnesses are the new AI battleground, a model from another time, and GPT-5.5 catches Mythos on cyber

2026 Week 17 news

Compute crunch 2.0 arrives, visual thinking points to the next wave, and Google’s 75%

2026 Week 16 news

The adaptive thinking backlash, Nvidia not a car but not untouchable, and OpenAI’s super app evolves