ExoBrain

ExoBrain Weekly Newsletter

Sol eclipsed by government permits, what price business outcomes, and a preview of the agentic curve

Welcome to our weekly newsletter, a combination of thematic insights from the founders at ExoBrain, and a broader news roundup from our Exo agents.

This week we look at:

  • Sol eclipsed by government permits

    OpenAI's GPT-5.6 Sol did not launch publicly. It reached twenty government-approved US firms after a call from the Commerce Secretary, who then lifted a worldwide block on a rival model. A discretionary permit regime for the frontier has formed.

  • What price business outcomes?

    We costed one real knowledge-work task, an end-to-end RFP response, across every way you can buy AI today. The same output ranged from sixteen cents to nearly twenty-eight dollars, a 170-fold spread that turns on procurement, not intelligence.

  • A preview of the agentic curve

    An internal OpenAI study of its own staff shows median output per researcher up 56-fold since November, with Codex now 99.8% of weekly output tokens. Treat it as a preview of the curves the rest of us will soon draw.

  • News roundup

    This week: market jitters from Oracle to DeepSeek, regulators circling Microsoft and OpenAI, fresh research on model ensembles and alignment, and a hardware race that now includes OpenAI's own inference chip.

Sol eclipsed by government permits

OpenAI's GPT-5.6 Sol did not launch publicly. It reached twenty government-approved US firms after a call from the Commerce Secretary, who then lifted a worldwide block on a rival model. A discretionary permit regime for the frontier has formed.

Joel Miller

Joel Miller

4 min read
Sol eclipsed by government permits

Last week we suggested that access to US frontier models had become a political decision. This week GPT-5.6 did not launch to the public. It arrived as a US-only preview to around 20 American companies, each preapproved by the government, after Commerce Secretary Howard Lutnick rang Sam Altman to confirm every relevant agency had signed off.

OpenAI calls GPT-5.6 Sol a next-generation model with a broad step change in capability, and it spent over 700,000 GPU hours on automated red-teaming, hunting universal jailbreaks, before showing it to anyone. The specific theme at release is cyber. In testing, Sol found real software vulnerabilities and built exploitation primitives against Chromium and Firefox, the engines behind most of the world's browsers. OpenAI's own line is that the model is better at helping people find and fix vulnerabilities than at reliably carrying out end-to-end attacks, and that it does not cross the cyber critical threshold.

But that dual-use quality is exactly why this is a futile argument. As we stated last week, the capability that patches a vulnerability is the capability that finds it, and a grading rubric cannot separate them because they are not separate. It now feels much more like the government has got a taste for inserting itself into the launch process of frontier AI and is not going to let go.

Shortly after the Sol announcement, Lutnick lifted the two-week-old block on Anthropic's Claude Mythos 5, the model his own department had forced offline worldwide in June. The reprieve was not a pardon: access would reach only the "certain trusted partners" he personally judged safe, he wrote to Anthropic. A cabinet secretary now personally determines which model reaches which institutions, by name.

I have determined that appropriate safeguards are in place to permit certain trusted partners to access the Claude Mythos 5 Model.

Howard Lutnick, US Commerce Secretary

Altman is putting on a diplomatic face, but the discomfort is clear. He told staff this is "not our preferred long term model", and while he conceded publicly that extensive safety testing "is not a bad idea", he drew the line clearly: "I just don't like the idea of the government picking the customers". He has every reason to be nervous, because he is now handing the launch of his most valuable product to an administration that is not afraid to reward its political allies.

Having taken a taste for controlling the frontier, the obvious next move is the open models themselves. They cannot delete GLM 5.2 from the internet, but they can forbid US firms, cloud platforms and US inference providers from hosting or serving it, much as they already reached for export-control law to switch Mythos off. That would not dent the model's global availability for a second. It would simply wall American developers off from the cheapest, fastest tools on the market while the rest of the world keeps using them. GLM 5.2 has arrived as the DeepSeek moment of 2026, open-weight, benchmarked above some of Google's best, priced at a fraction of Western tiers, with no annex and no off-switch. The data, charted by Exponential View from OpenRouter, shows where demand has already gone, US models down from roughly 72% of routed tokens a year ago to around 33% now.

At the moment China hands the world a free frontier, the US is letting a government known for rewarding its friends and punishing its enemies decide who may use the American alternatives.

Takeaways: GPT-5.6 Sol looks powerful, likely a match for Fable, and that has given Washington the excuse to take control. The genuine cyber risk is the cover, not the cause. What began as a one-off vendetta over a jailbreak has hardened in a fortnight into a permit regime, where a cabinet secretary decides by name which model reaches which firm, and two of the three leading labs already operate inside it. We have watched this administration treat power as a personal lever before, wielding tariffs deal by deal until the Supreme Court took the tool away. A discretionary gate over AI access invites exactly the same abuse from a government practised at rewarding friends. But as with tariffs it may be the US that ends up paying the price.

What price business outcomes?

We costed one real knowledge-work task, an end-to-end RFP response, across every way you can buy AI today. The same output ranged from sixteen cents to nearly twenty-eight dollars, a 170-fold spread that turns on procurement, not intelligence.

Joel Miller

Joel Miller

3 min read
What price business outcomes?

A few weeks ago we argued that AI compute had started to behave like a commodity, with a cheap open tier splitting away from an expensive frontier. That piece explained the mechanics. The obvious next question was the practical one: what does a real piece of knowledge work actually cost once you stop looking solely at tokens and start measuring business outcomes, and is the cheaper tier competitive?

So we measured one. We took a single complex task, an RFP response pipeline that reads a tender, selects credentials, drafts a reply and renders a branded deck, and ran it end to end. Then we costed that exact task across the main ways you can currently buy AI: the frontier APIs, open-weight models, self-hosted clusters, local machines, the flagship subscriptions, and Microsoft's Copilot Cowork.

The same work came out at 16 cents at the cheap end, on a well-used self-hosted cluster, and $27.77 at the expensive end. A spread of more than 170x, for identical output. The price you pay now depends far less on the intelligence applied than on the purchasing model wrapped around it.

Two findings stand out, and we flagged both last week before we had the numbers. The first is Copilot Cowork, which we said looked prohibitively expensive. It is. Microsoft prices Cowork in credits at one cent each, grading tasks light, medium or heavy, which suggests a ceiling around $12. Our real RFP burned 2,777 credits, or $27.77, more than double that heavy figure. The telling detail is that Cowork ran on Sonnet, the same model that costs $8.10 through the raw API. So the threefold gap is a clean reading of the managed-agent wrapper, not a difference in intelligence. That premium buys governance and Microsoft 365 grounding, but anyone budgeting Cowork should price it from measured runs, not from the headline task bands. On heavy work, those bands badly understate what you will actually spend.

The second is GLM-5.2, which we said was bringing near-frontier performance to a reasonable price point. Now quantified, it lands at an intelligence score just within touching distance of the best closed models, at barely $2 per million tokens. What makes it different is that it arrives in three forms at once: a cheap metered API, a cheap subscription, and openly licensed weights you can host yourself. A year ago, choosing sovereignty or cost control meant accepting a real capability penalty. That penalty has largely gone. Run GLM-5.2 on a well-used cluster and the same RFP task costs around 16 cents, because once the hardware is paid for the marginal cost is essentially electricity.

The full analysis covers the building blocks of AI pricing, the seven provider tiers, the cache economics behind every agentic task, and the complete cost table for June 2026.

Read the full guide: AI Cost Analysis, June 2026 →

Takeaways: The same task can cost 16 cents or nearly $28 depending entirely on how you buy it. Copilot Cowork sits alone, for now prohibitively expensive, while GLM-5.2 now delivers near-frontier work for the price of a coffee. The organisations that handle the next eighteen months well will not pick a single provider. They will treat AI as procurement, route each task to the cheapest tier that can do it well, and explore new ways to host and run their own inference.

A preview of the agentic curve

An internal OpenAI study of its own staff shows median output per researcher up 56-fold since November, with Codex now 99.8% of weekly output tokens. Treat it as a preview of the curves the rest of us will soon draw.

Joel Miller

Joel Miller

2 min read

This week's chart comes from a fascinating internal study from OpenAI, tracking how its own staff use AI agents. It shows the change in median output tokens per person by job function, normalised to 1x on 1 November 2025. By June 2026 the median researcher generates 56 times more output, with customer support at 32x, engineering at 27x and legal at 13x. The engine is Codex, now 99.8% of weekly output tokens generated inside the company, with non-developer use among staff up 12 times since August.

In our article on AI costs we argued the real measure is the cost of an outcome, not the token. This is borne out by OpenAI's data. The volumes are rising because the unit of work has changed. Nearly a quarter of Codex requests now represent tasks that would take a human over an hour. Agentic work multiplies token use many times over.

News roundup

This week: market jitters from Oracle to DeepSeek, regulators circling Microsoft and OpenAI, fresh research on model ensembles and alignment, and a hardware race that now includes OpenAI's own inference chip.

AI business news

AI governance news

AI research news

AI hardware news

Subscribe to the ExoBrain Weekly Newsletter

Stay up to date with AI. Get analysis of the week's most important stories, plus a focused roundup across business, governance, research and infrastructure.

Follow us on LinkedIn