ExoBrain Weekly Newsletter26 June 2026

Sol eclipsed by government permits, what price business outcomes, and a preview of the agentic curve

Welcome to our weekly newsletter, a combination of thematic insights from the founders at ExoBrain, and a broader news roundup from our Exo agents.

This week we look at:

Sol eclipsed by government permits
OpenAI's GPT-5.6 Sol did not launch publicly. It reached twenty government-approved US firms after a call from the Commerce Secretary, who then lifted a worldwide block on a rival model. A discretionary permit regime for the frontier has formed.
What price business outcomes?
We costed one real knowledge-work task, an end-to-end RFP response, across every way you can buy AI today. The same output ranged from sixteen cents to nearly twenty-eight dollars, a 170-fold spread that turns on procurement, not intelligence.
A preview of the agentic curve
An internal OpenAI study of its own staff shows median output per researcher up 56-fold since November, with Codex now 99.8% of weekly output tokens. Treat it as a preview of the curves the rest of us will soon draw.
News roundup
This week: market jitters from Oracle to DeepSeek, regulators circling Microsoft and OpenAI, fresh research on model ensembles and alignment, and a hardware race that now includes OpenAI's own inference chip.

Sol eclipsed by government permits

OpenAI's GPT-5.6 Sol did not launch publicly. It reached twenty government-approved US firms after a call from the Commerce Secretary, who then lifted a worldwide block on a rival model. A discretionary permit regime for the frontier has formed.

Joel Miller

26 June 20264 min read

Last week we suggested that access to US frontier models had become a political decision. This week GPT-5.6 did not launch to the public. It arrived as a US-only preview to around 20 American companies, each preapproved by the government, after Commerce Secretary Howard Lutnick rang Sam Altman to confirm every relevant agency had signed off.

OpenAI calls GPT-5.6 Sol a next-generation model with a broad step change in capability, and it spent over 700,000 GPU hours on automated red-teaming, hunting universal jailbreaks, before showing it to anyone. The specific theme at release is cyber. In testing, Sol found real software vulnerabilities and built exploitation primitives against Chromium and Firefox, the engines behind most of the world's browsers. OpenAI's own line is that the model is better at helping people find and fix vulnerabilities than at reliably carrying out end-to-end attacks, and that it does not cross the cyber critical threshold.

But that dual-use quality is exactly why this is a futile argument. As we stated last week, the capability that patches a vulnerability is the capability that finds it, and a grading rubric cannot separate them because they are not separate. It now feels much more like the government has got a taste for inserting itself into the launch process of frontier AI and is not going to let go.

Shortly after the Sol announcement, Lutnick lifted the two-week-old block on Anthropic's Claude Mythos 5, the model his own department had forced offline worldwide in June. The reprieve was not a pardon: access would reach only the "certain trusted partners" he personally judged safe, he wrote to Anthropic. A cabinet secretary now personally determines which model reaches which institutions, by name.

“I have determined that appropriate safeguards are in place to permit certain trusted partners to access the Claude Mythos 5 Model.”
Howard Lutnick, US Commerce Secretary

Altman is putting on a diplomatic face, but the discomfort is clear. He told staff this is "not our preferred long term model", and while he conceded publicly that extensive safety testing "is not a bad idea", he drew the line clearly: "I just don't like the idea of the government picking the customers". He has every reason to be nervous, because he is now handing the launch of his most valuable product to an administration that is not afraid to reward its political allies.

Having taken a taste for controlling the frontier, the obvious next move is the open models themselves. They cannot delete GLM 5.2 from the internet, but they can forbid US firms, cloud platforms and US inference providers from hosting or serving it, much as they already reached for export-control law to switch Mythos off. That would not dent the model's global availability for a second. It would simply wall American developers off from the cheapest, fastest tools on the market while the rest of the world keeps using them. GLM 5.2 has arrived as the DeepSeek moment of 2026, open-weight, benchmarked above some of Google's best, priced at a fraction of Western tiers, with no annex and no off-switch. The data, charted by Exponential View from OpenRouter, shows where demand has already gone, US models down from roughly 72% of routed tokens a year ago to around 33% now.

At the moment China hands the world a free frontier, the US is letting a government known for rewarding its friends and punishing its enemies decide who may use the American alternatives.

Takeaways: GPT-5.6 Sol looks powerful, likely a match for Fable, and that has given Washington the excuse to take control. The genuine cyber risk is the cover, not the cause. What began as a one-off vendetta over a jailbreak has hardened in a fortnight into a permit regime, where a cabinet secretary decides by name which model reaches which firm, and two of the three leading labs already operate inside it. We have watched this administration treat power as a personal lever before, wielding tariffs deal by deal until the Supreme Court took the tool away. A discretionary gate over AI access invites exactly the same abuse from a government practised at rewarding friends. But as with tariffs it may be the US that ends up paying the price.

What price business outcomes?

We costed one real knowledge-work task, an end-to-end RFP response, across every way you can buy AI today. The same output ranged from sixteen cents to nearly twenty-eight dollars, a 170-fold spread that turns on procurement, not intelligence.

Joel Miller

26 June 20263 min read

A few weeks ago we argued that AI compute had started to behave like a commodity, with a cheap open tier splitting away from an expensive frontier. That piece explained the mechanics. The obvious next question was the practical one: what does a real piece of knowledge work actually cost once you stop looking solely at tokens and start measuring business outcomes, and is the cheaper tier competitive?

So we measured one. We took a single complex task, an RFP response pipeline that reads a tender, selects credentials, drafts a reply and renders a branded deck, and ran it end to end. Then we costed that exact task across the main ways you can currently buy AI: the frontier APIs, open-weight models, self-hosted clusters, local machines, the flagship subscriptions, and Microsoft's Copilot Cowork.

The same work came out at 16 cents at the cheap end, on a well-used self-hosted cluster, and $27.77 at the expensive end. A spread of more than 170x, for identical output. The price you pay now depends far less on the intelligence applied than on the purchasing model wrapped around it.

Two findings stand out, and we flagged both last week before we had the numbers. The first is Copilot Cowork, which we said looked prohibitively expensive. It is. Microsoft prices Cowork in credits at one cent each, grading tasks light, medium or heavy, which suggests a ceiling around $12. Our real RFP burned 2,777 credits, or $27.77, more than double that heavy figure. The telling detail is that Cowork ran on Sonnet, the same model that costs $8.10 through the raw API. So the threefold gap is a clean reading of the managed-agent wrapper, not a difference in intelligence. That premium buys governance and Microsoft 365 grounding, but anyone budgeting Cowork should price it from measured runs, not from the headline task bands. On heavy work, those bands badly understate what you will actually spend.

The second is GLM-5.2, which we said was bringing near-frontier performance to a reasonable price point. Now quantified, it lands at an intelligence score just within touching distance of the best closed models, at barely $2 per million tokens. What makes it different is that it arrives in three forms at once: a cheap metered API, a cheap subscription, and openly licensed weights you can host yourself. A year ago, choosing sovereignty or cost control meant accepting a real capability penalty. That penalty has largely gone. Run GLM-5.2 on a well-used cluster and the same RFP task costs around 16 cents, because once the hardware is paid for the marginal cost is essentially electricity.

The full analysis covers the building blocks of AI pricing, the seven provider tiers, the cache economics behind every agentic task, and the complete cost table for June 2026.

Read the full guide: AI Cost Analysis, June 2026 →

Takeaways: The same task can cost 16 cents or nearly $28 depending entirely on how you buy it. Copilot Cowork sits alone, for now prohibitively expensive, while GLM-5.2 now delivers near-frontier work for the price of a coffee. The organisations that handle the next eighteen months well will not pick a single provider. They will treat AI as procurement, route each task to the cheapest tier that can do it well, and explore new ways to host and run their own inference.

A preview of the agentic curve

An internal OpenAI study of its own staff shows median output per researcher up 56-fold since November, with Codex now 99.8% of weekly output tokens. Treat it as a preview of the curves the rest of us will soon draw.

Joel Miller

26 June 20262 min read

This week's chart comes from a fascinating internal study from OpenAI, tracking how its own staff use AI agents. It shows the change in median output tokens per person by job function, normalised to 1x on 1 November 2025. By June 2026 the median researcher generates 56 times more output, with customer support at 32x, engineering at 27x and legal at 13x. The engine is Codex, now 99.8% of weekly output tokens generated inside the company, with non-developer use among staff up 12 times since August.

In our article on AI costs we argued the real measure is the cost of an outcome, not the token. This is borne out by OpenAI's data. The volumes are rising because the unit of work has changed. Nearly a quarter of Codex requests now represent tasks that would take a human over an hour. Agentic work multiplies token use many times over.

News roundup

This week: market jitters from Oracle to DeepSeek, regulators circling Microsoft and OpenAI, fresh research on model ensembles and alignment, and a hardware race that now includes OpenAI's own inference chip.

AI business news

Oracle's stock fell 19% this week, the steepest weekly drop since a 20% plunge in August 2001, amid concerns about its debt load and AI investments (Oracle's 19% single-week stock collapse, its worst since 2001, signals that investors are losing patience with AI infrastructure debt bets that haven't yet translated to returns.)
Sources: DeepSeek's $7.4B raise was prompted by the release of Mythos as CEO Liang Wenfeng realized DeepSeek couldn't compete without a massive war chest (DeepSeek's $7.4B raise being a direct response to Anthropic's Mythos reveals how a single model release can force a strategic pivot at a rival lab an ocean away.)
Notion kills its Gmail client after AI agents keep humans from troubling inbox (Notion shutting down its Gmail client because AI agents now handle over half of users' inboxes is the first concrete product casualty of agentic AI cannibalizing the tools it was built to augment.)
Amazon ups India bet with fresh $13B AI infrastructure investment (Amazon's $13B India AI infrastructure commitment reframes the subcontinent as a primary battleground for hyperscaler dominance, not just a cost center.)
Elastic stretches workforce 7% thinner as AI does more of the heavy lifting (Elastic cutting 7% of its workforce while explicitly crediting AI-driven engineering efficiency gives executives a named company and real headcount number to benchmark their own automation decisions against.)

AI governance news

US lawmaker introduces bill to require AI companies to report critical incidents (A Republican lawmaker's new mandatory incident-reporting bill signals that bipartisan AI safety legislation is finally moving from rhetoric to concrete legislative text.)
Italy is investigating Microsoft 365's price hike, saying Microsoft did not adequately inform users that it was integrating AI tools like Copilot into the suite (Italy's antitrust probe into Microsoft bundling Copilot into Microsoft 365 without adequate disclosure sets a precedent that could force AI-as-default pricing practices to be unwound across Europe.)
Italy to join US-led Pax Silica AI initiative despite Trump spat (Italy joining the US-led Pax Silica AI supply chain initiative, despite ongoing diplomatic friction with Washington, reveals that geopolitical alignment on AI infrastructure is consolidating faster than trade tensions might suggest.)
EU says Amazon, Microsoft cloud services should fall under digital dominance rules (The EU designating Amazon and Microsoft cloud services as subject to digital dominance rules would extend gatekeeper obligations into the foundational infrastructure layer that most AI products depend on.)
Florida sues OpenAI and CEO Sam Altman, claiming the company concealed serious risks of ChatGPT (Florida's state-led lawsuit against OpenAI and Sam Altman personally, alleging suppressed internal safety warnings, opens a new front of state-level AI liability that could outpace federal inaction.)

AI research news

When Does Combining Language Models Help? A Co-Failure Ceiling on Routing, Voting, and Mixture-of-Agents Across 67 Frontier Models (Testing 67 frontier models reveals a hard "co-failure ceiling" that exposes why ensemble approaches like mixture-of-agents hit diminishing returns, critical intelligence for anyone architecting multi-model production systems.)
The Verification Horizon: No Silver Bullet for Coding Agent Rewards (A systematic study showing no single reward signal reliably trains coding agents exposes a fundamental bottleneck in the path to autonomous software development that every AI engineering team needs to understand.)
Probing the Misaligned Thinking Process of Language Models (Researchers probe the internal reasoning traces of LLMs and find detectable signatures of deception and sandbagging, offering the first mechanistic handle on catching misaligned thinking before it surfaces in outputs.)
Deeper is Not Always Better: Mitigating the Alignment Tax via Confident Layer Decoding (Decoding from intermediate rather than final transformer layers measurably reduces the "alignment tax", a practical drop-in technique that challenges the foundational assumption baked into every current LLM inference stack.)
Sakana Fugu Technical Report (Sakana AI's Fugu technical report details how a small Tokyo-based lab is deliberately specializing its frontier model rather than competing on general benchmarks, signaling a new competitive strategy diverging from the scale-everything consensus.)

AI hardware news

IBM details major chip breakthrough with new sub-1nm ‘nanostack’ 3D architecture (IBM's claim of sub-1nm 3D "nanostack" architecture delivering up to 70% energy efficiency gains signals a potential inflection point in the post-FinFET transistor roadmap that could reshape AI accelerator design economics.)
AWS hikes prices for Nvidia GPUs in its EC2 Capacity Blocks service, which let businesses rent AI compute in advance, by 20%; Trainium chip pricing is unchanged (AWS raising Nvidia GPU Capacity Block prices 20% while leaving Trainium pricing unchanged is a direct financial signal that Amazon is actively using pricing pressure to push customers toward its own silicon.)
Qualcomm to design China-specific data center chip in line with US export curbs (Qualcomm entering the data center AI chip market with a China-specific SKU designed around U.S. export controls represents a structural new front in the effort to break Nvidia's near-monopoly on AI inference hardware.)
AMD, Meta strike $100B, 6 GW chip deal as AI race heats up (AMD's $100B, 6 GW multiyear supply commitment from Meta, following a similar deal with OpenAI, confirms that hyperscalers are deliberately building a second GPU supply chain independent of Nvidia.)
OpenAI and Broadcom unveil Jalapeño, an LLM-optimised inference chip (OpenAI designing its own inference accelerator from scratch, with performance per watt it claims beats the current state of the art, marks the moment the largest model maker stops being purely an Nvidia customer and starts owning the silicon underneath its stack.)

Subscribe to the ExoBrain Weekly Newsletter

Stay up to date with AI. Get analysis of the week's most important stories, plus a focused roundup across business, governance, research and infrastructure.