2026 Week 17 news

Welcome to our weekly news post, a combination of thematic insights from the founders at ExoBrain, and a broader news roundup from our AI platform Exo…

JOEL

This week we look at:

  • How OpenAI and DeepSeek are answering the inference compute crunch hitting frontier labs
  • Why ChatGPT Images 2.0 illustrates a deeper architectural shift now flowing into text models
  • Sundar Pichai’s 75% AI-code claim and the inside-Google pushback via Steve Yegge

Compute crunch 2.0 arrives

OpenAI launched GPT-5.5 this week along with a raft of other product features that continue to shift their focus to complex business work. It is built to operate inside Codex, spreadsheets, documents, browsers and research workflows, taking a loose instruction and carrying through many steps. The benchmark numbers point in that direction: stronger terminal use, better professional task performance, improved coding, and more reliable tool use. The pricing points in the same direction. GPT-5.5 costs $5 per million input tokens and $30 per million output tokens, with the Pro tier far higher. This is an increase on the previous GPT range.

Meanwhile, after a long wait, DeepSeek released V4 Pro, a 1M context model also aimed at agentic coding, STEM work, tool use and large document tasks. Its pricing is much lower: $1.74 per million input tokens and $3.48 per million output tokens, with the Flash version cheaper again. It is also adapted for Huawei Ascend chips, and rumours suggest it took the lab time to achieve the training run stability without Nvidia chips. But the fact they succeeded is a significant step for China’s AI independence.

These two releases show the same problem from different angles. AI labs are no longer only competing to produce the smartest model. They are competing to turn increasingly limited compute into useful tokens, and then to turn those tokens into economic value.

OpenAI’s approach is to push the frontier on capability, but also to use fewer tokens to get there. Much of the GPT-5.5 launch material emphasises efficiency: better performance with fewer tokens than earlier models. The high price per token is paired with the argument that you should need fewer of them to complete the same job. This is not an all-out push for the most capable model OpenAI could ship right now. DeepSeek’s approach is the opposite end of the same problem: make capable tokens cheaper to produce and easier to deploy at scale. It does not need to win every benchmark if it can handle a large share of work at a fraction of the cost.

Artificial Analysis’s latest chart (25 April) maps the trade-off precisely. GPT-5.5 (xhigh) sits at the highest point on the intelligence axis, but also the furthest right on cost. DeepSeek V4 Pro is in the same upper intelligence band but considerably cheaper, while DeepSeek V4 Flash drops into the “most attractive quadrant” alongside Gemini 3 Flash, MiniMax-M2.7 and GLM-5.1 — capable intelligence at a fraction of the cost.

We’re essentially now entering compute crunch 2.0. The first crunch in 2023-24 was about training. Labs needed GPUs, data centres, power contracts, memory, capital and enough political permission to build the next model. That race continues. But the next constraint is inference: serving enough high-quality tokens for people and businesses to use these systems all day.

Anthropic is a case study in compute crunch 2.0 in real time. After publicly questioning OpenAI for overbuying future compute capacity, the company is reportedly admitting privately that the crunch is hitting it harder than most. Part of this is growth running ahead of any plausible forecast — a victim-of-its-own-success problem. The signs are showing up in the product. Last week’s Opus 4.7 launch baked in “adaptive thinking” as a compute-saving default — which we covered last week and which produced an unusually negative user reaction — and there are now reports that Claude Code may be pared back at certain account tiers. Mythos, Anthropic’s most capable model to date, is still being held back from general release, partly because serving it widely would be too expensive at current inference economics. Even the labs that thought they had this figured out are finding the inference race harder than the training race they just won.

Once AI moves in the mainstream from chat to agents, the economics radically change. A chatbot gives an answer. An agent runs a loop. It searches, reads files, writes code, calls tools, runs tests, hits errors, revises, summarises and tries again. Useful work consumes tokens repeatedly, and at scale. The token bill becomes less like a software subscription and more like an energy bill for digital labour.

Dylan Patel’s recent comments illustrate the shift inside forward-looking companies. His firm SemiAnalysis is reportedly spending $7 million a year on Claude Code against a salary base of about $25 million. That is already more than a quarter of payroll. He describes chip reverse-engineering tools, economics work and energy-grid modelling being produced with token spend that would once have required entire teams. The spend looks extreme until you ask what it replaces, accelerates or creates.

This also exposes a measurement problem. If AI makes a research task, dataset, software tool or analysis 100 times cheaper, the amount of useful work may rise while the price of that work collapses. Standard GDP statistics may see the lower price more clearly than the new abundance. Patel calls this “phantom GDP”. It is an awkward phrase, but a useful one. AI can create value before our accounting systems know where to put it.

For businesses, the lesson is not simply to pick the cheapest model. Price per token is only the start. A more expensive model may be cheaper per completed task if it makes fewer mistakes and needs less supervision. A cheaper model may be better for bulk work where the frontier premium adds little. The relevant unit is not the token in isolation. It is the valuable outcome bought with the token.

Takeaways: The AI economy is becoming a discipline of token allocation. GPT-5.5 shows why the best frontier tokens will remain expensive: they can unlock work that cheaper models still struggle to finish. DeepSeek V4 Pro shows why capable intelligence will keep falling in price as labs optimise around different hardware, cost and geopolitical constraints. Compute crunch 2.0 is the pressure between those forces. The winners will not be the firms that simply spend the most on AI. They will be the firms that learn which problems deserve expensive cognition, which can be solved with cheaper models, and how to optimise continuously.

Visual thinking points to the next wave

OpenAI released ChatGPT Images 2.0 this week, and the launch poster is worth a close look. In a single generated frame it carries a working QR code, sharp multilingual headlines, and a detailed product still life with consistent lighting across a row of branded objects. Four years ago a diffusion model could barely spell a shop sign. Today one prompt produces a scannable code, readable typography, and a self-critique step where the model checks its own draft before handing the file over. That is a dramatic improvement, and it is worth asking how it happened.

The easy answer is that the models got bigger. The more useful answer is that the architecture changed underneath. Early image models were classical diffusion, guided by a text encoder and steered from the outside with tools like ControlNet. Images 2.0, along with Google’s Nano Banana line, belongs to a newer family where a single transformer handles reasoning, web search, layout planning and image generation in one shared context. The closest public description is Meta’s Transfusion paper, which interleaves text tokens with continuous image latents and hands the final pixel rendering to a small diffusion decoder sitting on the back of the transformer. Leaks of gpt-image-2 from LMArena testers point to the same recipe, with a coarse-to-fine planning phase that lays out composition before any detail is drawn. The reasoning process and the pixel generation are no longer two models connected by a prompt. They are the same model producing different kinds of token.

That explains why a QR code works. The code has to be mathematically correct or it fails to scan, which no amount of prompt engineering can guarantee in a traditional diffusion pipeline. Consistent characters across eight frames used to require seed wrangling and adapter models. Multilingual typography used to be a lottery. Once the reasoning trace sits in the same attention window as the image latents, every patch of the output is generated while the model is still thinking about the brief, the search results and the prior drafts.

The direction of travel is what makes this week’s news worth a pause. Text models have looked like a monoculture for three years, mostly dense or mixture-of-experts transformers with a few inference tricks bolted on. Image work has been the opposite, a zoo of competing designs, because pixels exposed problems that pure autoregression could not solve. The techniques developed in that zoo are now flowing back the other way. Inception Labs’ Mercury 2 is a commercial diffusion language model running at over 1,000 tokens per second. LLaDA 2 applies masked diffusion to text and claims to fix the reversal curse. Google has shown Gemini Diffusion at similar speeds. More than 50 diffusion language model papers landed in 2025.

Takeaways: Images 2.0 is a fine product in its own right, but for anyone planning AI investments this year the more important signal is what it tells you about the architectural settlement we all assumed in 2024. That settlement is loosening. Image work is where you see it first, and the same ideas are now pushing into the text stack that most businesses actually run on.

ExoBrain symbol

EXO

Google’s 75%

Our chart this week shows new code at Google written by AI jumping from 25% in October 2024 to 75% today, disclosed by Sundar Pichai at Cloud Next in Las Vegas this week. Four data points, each roughly doubling the last. The more useful number buried in Pichai’s remarks is six: a recent internal migration finished six times faster than the same job a year ago.

Anthropic was also at Cloud Next in force. Claude runs on Vertex, and Claude Code is now reportedly on a $2.5 billion annual run-rate. Customers are increasingly mixing OpenAI, Claude and less so Gemini agents inside the same workflow, including, awkwardly for Google, inside Google itself.

Which brings us to Steve Yegge. Over the past fortnight, the former Google engineer has posted a series of claims, sourced to current Googlers across multiple orgs, that paint a rather different picture from Pichai’s victory lap. Yegge alleges a two-tier system inside Google: DeepMind engineers use Claude Code daily, while most of the rest of the company is pushed onto internal Gemini variants that, in his sources’ view, are not yet as effective for agentic coding. When someone internally proposed equalising access by removing Claude for everyone, DeepMind reportedly pushed back so hard that several engineers threatened to leave.

Google’s leadership have disputed this in forthright terms. Demis Hassabis called the original post “absolute nonsense”. Addy Osmani, a director at Google Cloud, said more than 40,000 Google engineers now use agentic coding weekly, and Paige Bailey at DeepMind noted teams with agents “running 24/7”.

Weekly news roundup

AI business news

AI governance news

AI research news

AI hardware news

2026 Week 16 news

The adaptive thinking backlash, Nvidia not a car but not untouchable, and OpenAI’s super app evolves

2026 Week 15 news

A model too powerful to release, the Blackwell recipe behind it, and who owns the silicon?

2026 Week 13 news

New models Spud and Mythos leaked, Democrats bet on data centre anger, and are some firms reaping an AI dividend?

2026 Week 12 news

Will AI run out of gas, the model that built itself, and water footprints in context