OpenAI launched GPT-5.5 this week along with a raft of other product features that continue to shift their focus to complex business work. It is built to operate inside Codex, spreadsheets, documents, browsers and research workflows, taking a loose instruction and carrying through many steps. The benchmark numbers point in that direction: stronger terminal use, better professional task performance, improved coding, and more reliable tool use. The pricing points in the same direction. GPT-5.5 costs $5 per million input tokens and $30 per million output tokens, with the Pro tier far higher. This is an increase on the previous GPT range.
Meanwhile, after a long wait, DeepSeek released V4 Pro, a 1M context model also aimed at agentic coding, STEM work, tool use and large document tasks. Its pricing is much lower: $1.74 per million input tokens and $3.48 per million output tokens, with the Flash version cheaper again. It is also adapted for Huawei Ascend chips, and rumours suggest it took the lab time to achieve the training run stability without Nvidia chips. But the fact they succeeded is a significant step for China’s AI independence.
These two releases show the same problem from different angles. AI labs are no longer only competing to produce the smartest model. They are competing to turn increasingly limited compute into useful tokens, and then to turn those tokens into economic value.
OpenAI’s approach is to push the frontier on capability, but also to use fewer tokens to get there. Much of the GPT-5.5 launch material emphasises efficiency: better performance with fewer tokens than earlier models. The high price per token is paired with the argument that you should need fewer of them to complete the same job. This is not an all-out push for the most capable model OpenAI could ship right now. DeepSeek’s approach is the opposite end of the same problem: make capable tokens cheaper to produce and easier to deploy at scale. It does not need to win every benchmark if it can handle a large share of work at a fraction of the cost.

Artificial Analysis’s latest chart (25 April) maps the trade-off precisely. GPT-5.5 (xhigh) sits at the highest point on the intelligence axis, but also the furthest right on cost. DeepSeek V4 Pro is in the same upper intelligence band but considerably cheaper, while DeepSeek V4 Flash drops into the “most attractive quadrant” alongside Gemini 3 Flash, MiniMax-M2.7 and GLM-5.1 — capable intelligence at a fraction of the cost.
We’re essentially now entering compute crunch 2.0. The first crunch in 2023-24 was about training. Labs needed GPUs, data centres, power contracts, memory, capital and enough political permission to build the next model. That race continues. But the next constraint is inference: serving enough high-quality tokens for people and businesses to use these systems all day.
Anthropic is a case study in compute crunch 2.0 in real time. After publicly questioning OpenAI for overbuying future compute capacity, the company is reportedly admitting privately that the crunch is hitting it harder than most. Part of this is growth running ahead of any plausible forecast — a victim-of-its-own-success problem. The signs are showing up in the product. Last week’s Opus 4.7 launch baked in “adaptive thinking” as a compute-saving default — which we covered last week and which produced an unusually negative user reaction — and there are now reports that Claude Code may be pared back at certain account tiers. Mythos, Anthropic’s most capable model to date, is still being held back from general release, partly because serving it widely would be too expensive at current inference economics. Even the labs that thought they had this figured out are finding the inference race harder than the training race they just won.
Once AI moves in the mainstream from chat to agents, the economics radically change. A chatbot gives an answer. An agent runs a loop. It searches, reads files, writes code, calls tools, runs tests, hits errors, revises, summarises and tries again. Useful work consumes tokens repeatedly, and at scale. The token bill becomes less like a software subscription and more like an energy bill for digital labour.
Dylan Patel’s recent comments illustrate the shift inside forward-looking companies. His firm SemiAnalysis is reportedly spending $7 million a year on Claude Code against a salary base of about $25 million. That is already more than a quarter of payroll. He describes chip reverse-engineering tools, economics work and energy-grid modelling being produced with token spend that would once have required entire teams. The spend looks extreme until you ask what it replaces, accelerates or creates.
This also exposes a measurement problem. If AI makes a research task, dataset, software tool or analysis 100 times cheaper, the amount of useful work may rise while the price of that work collapses. Standard GDP statistics may see the lower price more clearly than the new abundance. Patel calls this “phantom GDP”. It is an awkward phrase, but a useful one. AI can create value before our accounting systems know where to put it.
For businesses, the lesson is not simply to pick the cheapest model. Price per token is only the start. A more expensive model may be cheaper per completed task if it makes fewer mistakes and needs less supervision. A cheaper model may be better for bulk work where the frontier premium adds little. The relevant unit is not the token in isolation. It is the valuable outcome bought with the token.
Takeaways: The AI economy is becoming a discipline of token allocation. GPT-5.5 shows why the best frontier tokens will remain expensive: they can unlock work that cheaper models still struggle to finish. DeepSeek V4 Pro shows why capable intelligence will keep falling in price as labs optimise around different hardware, cost and geopolitical constraints. Compute crunch 2.0 is the pressure between those forces. The winners will not be the firms that simply spend the most on AI. They will be the firms that learn which problems deserve expensive cognition, which can be solved with cheaper models, and how to optimise continuously.
