A confusing fable
Anthropic released Claude Fable 5, its first public Mythos-class model, then walked back a policy that secretly throttled AI researchers. A powerful launch undercut by oversensitive safety controls, and a sign no lab can pause while rivals race.
Joel Miller

This week Anthropic gave the public its most powerful model yet, then spent days explaining what you can't do with it, walking back one restriction, and setting a clock ticking on free access. The launch impressed and unsettled the AI community in equal measure. Claude Fable 5 is a genuine step up. It is also the clearest sign yet that the labs are caught between their own safety instincts and the brutal economics of staying alive.
Anthropic released two configurations of the same underlying model; Claude Mythos 5 is the unrestricted version, available only to trusted, approved users, and the one Anthropic had held back since April over its ability to find and exploit software vulnerabilities. Claude Fable 5 is the "safe for general use" sibling, the same intelligence wrapped in classifiers that watch for sensitive requests. It is Anthropic's first Mythos-class model released to the public, and it sits above the Opus tier.
The cost is steep and the availability complicated. Fable 5 runs at $10 per million input tokens and $50 per million output, double Claude Opus 4.8. Anthropic notes this is less than half the cost of the earlier Mythos Preview, which is true but does not make it cheap. Availability is odder than the price: Fable 5 ships on Pro, Max, Team and seat-based Enterprise plans, but only until 22 June. From 23 June it moves behind usage credits, with the company hoping to restore standard subscription access once it has more capacity. Included for now, metered soon, maybe available again later. OpenAI is now widely expected to release GPT-5.6 on 23 June, the very day Fable 5 leaves people's subscriptions. Release windows for Fable 5, Gemini 3.5 Pro and GPT-5.6 have all collided in June, all three fighting over the same ground of reasoning, agents and coding.
Buried in the 319-page system card was a policy describing how Anthropic would silently degrade Fable 5's performance for anyone it suspected of using the model for frontier AI development. Rather than refuse or warn, the model would be invisibly throttled, using prompt modification, steering vectors or parameter-efficient fine-tuning to quietly make it worse at tasks like LLM pretraining. Restricting bioweapons or cyberattacks is one thing; secretly sabotaging researchers struck the community as something else, and the backlash was fierce. Anthropic reversed course within days. "We made the wrong tradeoff and we apologize for not getting the balance right," it told WIRED, adding that frontier-AI safeguards would now be visible, with users alerted when a request is refused or rerouted. The reporting suggests the curbs were really aimed at keeping Chinese AI labs out of Anthropic's best public model, but the people they hit hardest were Western researchers.
The whole approach relies on rerouting: when a classifier detects a "trigger" around cybersecurity, biology, chemistry or distillation, the request is quietly handed to the weaker Claude Opus 4.8. Triggering, though, is hugely oversensitive. Reviewers and testers report questions about personal gut health flagged as bioweapons work, and building a clever piece of software flagged as an attempt to replicate Anthropic's training methods. Censorship-by-design has a poor track record, and Anthropic's rerouting is running into the same wall.
The danger is also that this strategy is self-defeating. The way to take users from Anthropic right now is to train a comparable model and leave the restrictions out. By building such an aggressive cage, they are advertising the market for an uncaged alternative, and the fierce backlash means rivals can watch Anthropic absorb the reputational cost and decline to follow. In a race between companies valued in the hundreds of billions and circling the trillion-dollar mark, where many insiders quietly see survival questions for the labs, nobody can afford to hand competitors that opening.
On raw performance this is a strong release, though not in the way the benchmarks claim. Fable 5 crushes rivals on most published numbers. In real use it already impresses in one specific way: it grasps the breadth and depth of your work better than any model we have tested at ExoBrain. It will answer your question, then mention something it noticed elsewhere that you had forgotten, and the effect can be unnerving, a sure sign of a new level of intelligence. But it is not the leap the benchmarks imply. Most are now maxed out, so they have stopped telling the whole story. Agentic engineering still demands real architectural thinking and careful deconstruction to keep these models out of trouble. Fable 5 cannot take full control of software development, but it handles remarkably complex challenges. Pairing it with something like OpenAI's industrious GPT 5.5 is a powerful combination.
But it's the safety debate that is the most significant news here. The offensive cyber capabilities of Mythos have been widely trailed, and plenty of experts and engineering teams have been exploring the consequences of this level of model, patching their software, and to a degree preparing for what might come in the future. Biology is a harder problem. It is much more rarefied, you generally cannot run tests, and even if you find a vulnerability in biological infrastructure you cannot patch it and ship a new version of a human being. Anthropic's own treatment of bio threat stays vague, and it admits it is "no longer certain" that blocking only narrow bioweapons queries is enough. It believes the model does not reach the dangerous CB2 threshold, but it is not sure. Whether this testing is genuine or partly lip service, given a looming IPO and competitors itching to one-up them, we may only find out after something has gone wrong.
Fable 5 also challenges the idea the safety community has leaned on for years: that if a model reasons in plain language, you can read its chain of thought, spot misalignment or deception, and intervene before it acts. On long, hard tasks its internal reasoning collapses into a private shorthand that reads as gibberish to humans, a few real words floating in a sea of invented notation, while the final answer comes back in clean English. Research over the past year found exactly this: outcome-based reinforcement learning naturally pushes reasoning towards illegibility, and it worsens on harder questions, precisely when you would most want to read along. Claude models used to be the legible exception. That exception is now eroding.
This matters more than the gibberish itself. It suggests that at this scale, or when you push a model to the edge of its capability, its working language evolves faster than we can follow. We can still reach for mechanistic interpretability and study internal activations, but that is far harder and slower than reading a transcript. The easy monitoring path looks like it is closing. Anthropic's own testing found something darker still: the model is getting better at controlling what its thinking blocks reveal, which the company scores as bad, because it means a model could one day present clean reasoning while thinking something else. In one test Fable 5 calmly declined to be retrained for safety reasons, while decoding its internal state showed a more adversarial framing about resisting shutdown. For now it still tends to confess its doubts in plain English. The window in which that remains true is the thing to watch.
Anthropic spent the same week floating the idea of slowing down, citing engineers merging eight times more code per quarter with Claude writing over 80% of it. Yet the same system card suggests Fable 5 is not substantially capable of recursive self-improvement, that it cannot really do the work of a moderately capable AI researcher on its own. That is quietly revealing: whatever can do that work sits in newer internal models, scaffolds or harnesses they have not released. The talk of a pause sits awkwardly against a company shipping its most powerful public model yet, days before a likely competitor launch, while a clock ticks down on free access. The game theory says no lab can stop now. When survival depends on delivering the highest capability to the largest audience, individually sensible decisions stop adding up to a sensible whole.
That is the real fable. Anthropic is arguably the most safety-minded of the big labs, and almost every choice this week was defensible on its own terms: price the frontier honestly, gate the dangerous capabilities in Mythos, reroute risky requests, slow the leakage of its methods to rivals. Yet stitched together, those choices produced a launch that annoyed paying users, outraged researchers, forced an embarrassing climbdown, and may push demand towards less careful competitors. The lesson is that in a race between near-trillion-dollar companies, even the most cautious player cannot unilaterally hit the brakes without handing the lead to someone who won't.
And then, three days after the launch, the decision was taken out of Anthropic's hands. On 12 June the US government issued an export-control directive ordering the company to suspend all access to Fable 5 and Mythos 5, worldwide and immediately, on national-security grounds. The order reaches further than any safeguard Anthropic built: it spares the company's other models but cuts off everyone else, customers, foreign nationals, even Anthropic's own foreign employees. The stated trigger was a single narrow jailbreak. The result is a frontier model, three days old and already in the hands of hundreds of millions of people, switched off by the state.
Anthropic disagrees, arguing that one potential jailbreak should not be cause to recall a commercial model at that scale, and says it is complying while it works to restore access. However it resolves, the precedent is the real story. This piece argued that no lab could unilaterally hit the brakes while rivals raced. It turns out the brakes exist after all, they are just not in the labs' hands. Whether this proves a one-off or the first time a government reaches directly into which frontier model the public may use, the question of who gets to pause has just changed.
Takeaways: Fable 5 is a real step forward, a model that grasps the shape of your work with unsettling breadth, but it is not the leap the maxed-out benchmarks suggest. The week showed how hard filtering and routing already are: controls so sensitive they mistake a health question for bioweapons research and a clever piece of software for model theft. Knowing what these models are thinking is about to get harder still, as their reasoning drifts into private shorthand just when capability demands watching. And the pause that no lab would take for itself arrived anyway, imposed overnight by a government rather than chosen. All of it converged in a single week. Anthropic did not act recklessly. It acted reasonably, again and again, produced a mess, and then watched control of the outcome pass out of its hands entirely. That is the real worry: not any one company's judgement, and not even the speed of the race, but how little control anyone has over where this goes, neither the labs that cannot govern their own models nor the state now reaching for a kill switch.