Just six days after Google unveiled Gemini 3 and claimed the benchmark crown, Anthropic answered with Claude Opus 4.5. We were probably too quick, like everyone else, to suggest the race was over.
On SWE-Bench Verified, the industry standard for coding, Claude Opus 4.5 scored 80.9%, edging past both GPT-5.1’s Codex-Max (77.9%) and Gemini 3 Pro (76.2%). It also leads on Terminal-Bench, which tests a model’s ability to operate a Linux command line autonomously: Claude scored 59.3% against Gemini’s 54.2%. Claude also excels at sustained work. It can maintain reasoning across long sessions, preserving its chain of thought where earlier models would lose the thread. That said, Gemini 3 Pro still dominates on raw reasoning challenges. On Humanity’s Last Exam, Gemini scored 37.5% compared to Claude’s 13.7%. On ARC-AGI, another reasoning benchmark, Gemini reached 31% in standard mode and 45% with its Deep Think feature enabled. Claude trails on both. The picture that emerges is one of specialisation: Claude is the model you want writing and debugging your code, while Gemini currently handles the hardest abstract reasoning tasks. Neither has pulled decisively ahead across the board.
But the real story is efficiency. Claude Opus 4.5 can deliver its performance while using 76% fewer tokens. Superior quality at roughly a quarter to a third of the computational work. Anthropic paired this with a dramatic price cut: from $15/$75 per million input/output tokens to just $5/$25 (slightly more on a par with the $2/$12 of Gemini 3). The previous Opus models were powerful but prohibitively expensive. This one brings Claude-style frontier capability within reach of far more teams and use cases.
November has been a remarkable month. OpenAI, Google, xAI, and now Anthropic have all shipped major upgrades. The progress curve continues upward, with no sign of the ceiling that some predicted. In the domains where these models excel, coding, analysis, tool use, the gains keep coming.
Takeaways: At this price point, with this efficiency, Claude becomes a serious contender for business deployment. Whether it’s writing and reviewing code, browsing the web for research, or working through spreadsheets with the new Excel plugin (we’ve been beta testing this and its incredible for this kind of desktop work), organisations now have a balanced frontier model that won’t burn through budgets. For teams weighing their AI options, Opus 4.5 makes the decision harder in the best possible way.
