The adaptive thinking backlash
Anthropic’s Opus 4.7 faces user backlash due to its new adaptive thinking mode and tokenisation changes, revealing a disconnect between benchmark performance and real-world developer experience.
Joel Miller

Anthropic launched Opus 4.7 this week and the reaction has been unusually negative. Within hours, power users were complaining the model felt shallower than 4.6, followed instructions worse, and burned through weekly allowances at an alarming rate. On X and Reddit, long-time Claude fans described spending hours debugging their own setups because they could not believe the new release was really the upgrade on the box. One widely shared post called it “basically 4.6 with low thinking as a default”. The word most often used was regression.
The central change sits behind a harmless-sounding phrase. Extended thinking, the old toggle that let users decide how hard Claude should reason, has been replaced with “adaptive thinking”, where the model itself decides. In Claude Code the default shifts to a new xhigh effort level; in the consumer UI the controls are thinner. Combined with a new tokeniser that maps the same text to up to 35% more tokens, and a temporary 7.5x premium multiplier in GitHub Copilot, the effect for many developers is that Opus 4.7 costs more and feels less reliable than the model it replaced.
None of this shows up in the headline benchmarks. On SWE-bench Verified and SWE-bench Pro, 4.7 is a clear step up, and Anthropic’s partner case studies report meaningful gains on long-running agentic work. Yet on SimpleBench, which tests the kind of everyday common-sense reasoning humans actually ask for, 4.7 stumbles. That gap is the story. Benchmarks test hard things, and adaptive thinking happily spends compute when it senses difficulty. For a simple question that a careful human would still think about for ten seconds, the model may decide no reasoning is needed and fire back a confident, shallow answer.
This is not unique to Anthropic. OpenAI went through the same cycle with GPT-5’s auto-routing last year and had to restore explicit effort controls after a backlash. The lesson does not seem to have travelled. Labs keep trying to hide effort management behind clever routing because the internal maths is seductive: similar average accuracy at much lower cost. What the maths misses is that heavy users are not the averages, and they notice issues within minutes.
Takeaways: The Opus 4.7 backlash is not really about one model. It is about an industry struggling to balance the cost of reasoning with demand for intelligence.
