AI’s experience beyond words
A new essay by Sutton and Silver argues that AI must transition from mimicking human data to learning through experiential interaction with the real world to overcome current performance ceilings.
Joel Miller

Imagine grading a cake by reading the recipe instead of tasting it. Google DeepMind’s David Silver uses this image to mark the passing age human data powered AI, where graders reward an answer because it looks convincing rather than because it works. The result is a layer of effective mimicry that flatters users yet can’t push past human knowledge. Silver’s provocation is simple: let the model bake the cake, eat it, and learn from the flavour.
Richard Sutton and Silver call this the era of experience. Their new essay argues that agents must inhabit lifelong streams of action, sense the consequences and tune their policies to grounded signals such as heart-rate, revenue or tensile strength. Static human data will soon hit a ceiling; experiential data can grow without limit…
Memories of AlphaGo support the point. When DeepMind stripped professional games from AlphaZero’s training diet and relied on self-play, performance soared. The same pattern resurfaced in AlphaProof, which generated 100 million of its own proofs and reached International Mathematical Olympiad silver level.
The push away from human limits is underway. Coconut, a 2024 experimental model from Meta, keeps its “thoughts” inside high dimensional space instead of working through a chain of readable words. It solved standard logic tests with the same 98.8 % accuracy as a baseline while emitting one-tenth of the text, saving compute. Other researchers show models switching languages mid-problem or signalling answers through non-English starter tokens, a reminder that non-human reasoning already lives quietly inside many systems. Critics worry that experiential agents will be harder to audit: if the chain of thought never reaches text, how do we check for deception? Sutton and Silver reply that real-world rewards actually provide a clearer incentive than a subjective thumb-ups. Yet obtaining reliable signals outside labs remains expensive, and online learning keeps GPUs spinning long after pre-training ends.
Venture capitalist Deedy Das called the essay “Sutton’s most important since The Bitter Lesson”, a 2019 note in which Sutton showed that letting algorithms crunch vast amounts of data and compute usually beats carefully hand-coded rules. His praise signals investor optimism that a similar “scale wins” dynamic could now unfold around experiential data rather than human text. Pratap Ranade, who builds AI tools for retailers, echoed that view, saying “nature is the best compression algorithm”. In plain terms: the real world already stores knowledge in the way things behave, so an agent that pokes and measures the world may learn more efficiently than one that reads documentation. Sceptics counter that the vision is less revolutionary than it sounds. Computer-science professor Ali Minai argued the ideas are “obvious to anyone who has looked beyond gradient descent”, the basic method most neural networks use to nudge their internal numbers towards lower error. His point: researchers steeped in older schools of AI, such as symbolic reasoning or evolutionary methods, have long championed active exploration, so re-branding it as a new era risks unneeded hype. Together, the reactions reveal a community split between those betting that experience will unlock fresh value and those wary of repeating history’s cycles of exuberance and retrenchment.
Takeaways: The age of experience reframes progress; success from the new wave of agents will hinge on feedback from reality, not from human graders. Companies that wire agents to real-world signals, trim the cost of continuous learning and develop new safety lenses for silent thought will harness the true power of the new digital workforce. Sutton and Silver offer a manifesto, but the race now moves from theory to the challenge of giving AI agency connected to the real world.
