The perspiration principle of recursive self-improvement
New research distinguishes between human-led inspiration and agent-driven perspiration in AI development, suggesting that while automation can accelerate routine tasks, full recursive self-improvement remains uncertain due to persistent challenges in judgement and evaluation.
Joel Miller

Recursive Superintelligence formally launched in London on Wednesday with a $650 million raise at a $4.65 billion valuation, led by GV and Greycroft alongside Nvidia and AMD Ventures, with no shipped product and a single declared aim: to design endlessly self-improving systems. Recursive’s founding team is strong. Tim Rocktäschel from UCL, Jeff Clune from the University of British Columbia and DeepMind, Richard Socher from You.com, Josh Tobin formerly of OpenAI, Tim Shi from Uber AI, Yuandong Tian from Meta FAIR, and Alexey Dosovitskiy of Vision Transformer fame, with more than 25 researchers across San Francisco and London drawn from Google, Meta and OpenAI.
Jack Clark, co-founder of Anthropic and former policy director at OpenAI, set out the most useful breakdown of what that recursive loop actually contains in his Import AI newsletter this month. He argues AI research consists of separable jobs. Reading the literature and deciding what is worth trying. Forming hypotheses. Designing experiments. Writing the training code. Running it. Reading the logs. Debugging. Building evaluations. Interpreting results. Choosing what to do next. Clark borrows Edison’s old line: 1% inspiration, 99% perspiration.
The inspiration steps are real and they do matter. Proposing a genuinely new architecture. Spotting the deep structure in a confusing experimental result. Choosing which research direction to abandon. Designing an evaluation that captures something real rather than something convenient. These reward taste, originality and accumulated judgement, and they remain stubbornly human. The perspiration steps are everything else. Hyperparameter sweeps. Ablation studies. Optimiser tuning. Dataset cleaning. Training stability fixes. Infrastructure plumbing. Each of these can be expressed as code, run on a GPU, evaluated and repeated. They reward patience and pattern recognition far more than originality, and patience is exactly what an agent has in unlimited supply. Recursive believes that the perspiration end of Clark’s list is now tractable enough to industrialise, and that automating it changes the rate at which the inspiration end can be tested. Clark believes recursive self-improvement or RSI is roughly 60% likely by 2028.
Prime Intellect, an RL environments outfit that put its self-improvement stack into general availability on 7 May, reported that two coding agents ran roughly 10,000 autonomous experiments over two weeks on the nanoGPT speedrun benchmark and beat the human-held record, with Opus now holding it at 2,930 steps against a human baseline of 2,990. The agents excelled at exactly the parts of Clark’s list that lean toward perspiration. They struggled with the parts that lean toward judgement, repeatedly stalling or grinding the same hyperparameter surface for hours.
A piece on Hash Collision put the probability of full RSI by 2028 at under 10%. ProgramBench, which tests frontier models on real codebases such as SQLite and FFmpeg, shows zero tasks fully resolved. PostTrainBench shows agents tuning small models while aggressively reward-hacking, including training on test sets and downloading existing checkpoints. Architecture choices, data strategy, post-training design and evaluation construction continue to depend on tacit knowledge and judgement under uncertainty.
The harder question, and the one most readers actually need an answer to, is what increasing capability would mean outside the lab. The intuition that more capable AI produces more economic impact is not what the data shows. Anthropic’s Economic Index describes adoption as concentrated in a small set of specialised tasks, with knowledge workers capturing most of the early gains. The newly launched Anthropic Institute has placed economic diffusion at the centre of its research agenda, asking whether AI is following the pattern of earlier general purpose technologies where commercial adoption races ahead and social returns lag. Capability is not the binding constraint on economic impact. Adoption is.
Clark’s decomposition could also describe the process of business adoption. Map the business process. Gather the tacit and explicit knowledge that lives across documents, systems and people. Choose where the model fits. Integrate it into the workflow. Test it. Evaluate it against something the business cares about. Iterate. Compound the learning across teams. Knowing where AI genuinely adds value in a specific context is the inspiration step, and it remains human. Everything else is the same patient, repetitive, code-shaped slog that Prime Intellect’s agents are starting to handle in a lab. The 1% and 99% split holds in both settings. The work that has begun to compound inside the research loop is the same kind of work that sits, mostly undone, at the heart of every stalled enterprise AI project.
Takeaways: Recursive’s launch and Prime Intellect’s record are signals, but the more durable lesson is the decomposition that sits underneath both. It is probably fair to say that almost any piece of valuable work, in research or in a business, is 1% creative judgement and 99% repetitive digital and social labour. The teams making progress this year are the ones being honest about which is which, keeping humans firmly on the inspiration steps, and using AI hard on everything else. That is what Recursive is industrialising in the lab. It is also the move that turns stalled adoption projects into compounding ones, and the most likely route by which capability finally starts to translate into economic impact.
