ExoBrain Weekly Newsletter03 July 2026

Meta's great retreat, DeepSeek does more with less, and the AI boom compared

Welcome to our weekly newsletter, a combination of thematic insights from the founders at ExoBrain, and a broader news roundup from our Exo agents.

This week we look at:

Meta's great retreat
Meta plans to sell access to its AI compute and models after $140 billion of spending. The pivot reads as retreat, but cheap compute and a fourth serious model entering the market could benefit everyone outside the duopoly.
DeepSeek does more with less
DeepSeek's open-sourced DSpark speeds up model serving by 57 to 85% with no loss of quality, and it works across model families. Efficiency, not raw capability, is becoming the most valuable work in AI.
The AI boom compared
A BIS chart puts AI investment at 4.5 times its pre-boom level in three years, steeper than canal mania, railways, the Roaring 20s and dotcom. Those booms broke around year five. We are at year three.
News roundup
This week: OpenAI offers Washington equity while Sonnet 5 resets the mid tier, regulators and courts test AI accountability, research probes the limits of scale, and the chip race gains new challengers from Etched to Samsung.

Meta's great retreat

Meta plans to sell access to its AI compute and models after $140 billion of spending. The pivot reads as retreat, but cheap compute and a fourth serious model entering the market could benefit everyone outside the duopoly.

Joel Miller

03 July 20264 min read

Is this what losing looks like in the AI wars? You spend more than $140 billion over three years building the machinery to win, then start renting it to the very rivals you meant to beat. On Wednesday, Bloomberg reported that Meta is building a cloud business, under its internal Meta Compute initiative, to sell access to its AI computing power and models. The plan comes in two forms. Meta would let developers pay to use its models hosted on its own infrastructure, including the closed-weight Muse Spark, in the style of Amazon's Bedrock. And it would rent out raw compute capacity like a neocloud such as CoreWeave. In doing so it goes up against AWS, Azure and Google Cloud, the incumbents who actually run this business at a profit.

Meta is the one hyperscaler that never became a cloud provider, because it always consumed every chip it could buy to power its own products. Now it wants to sell that capacity to outsiders. When you build a fleet to train frontier models and then lease the fleet to others, you are telling the market you have more silicon than you have worthwhile internal solutions to run on it.

This is reminiscent of xAI. Rather than becoming a leading model developer, Elon Musk's outfit, now folded into SpaceX, rents its Colossus data centres to others. In May, Anthropic took more than 300 megawatts of that capacity, over 220,000 Nvidia GPUs. In June, Google agreed to pay SpaceX around $920 million a month for xAI compute through 2029. Yann LeCun, Meta's former chief AI scientist, has called xAI "kind of a failure", its founding team gone and compute rental the only way left to recoup the cost of the hardware. His old employer is now walking the same road.

The people problem tells the same story as the hardware. Inside Meta, morale in the 6,500-strong Applied AI unit has cratered. Wired described a staff livestream hijacked so someone could demand a senior executive be told he was "a piece of shit". TechCrunch summed up its engineers' verdict as a "soul-crushing gulag". More than 1,600 employees signed a petition against a scheme monitoring their keystrokes for training data. Zuckerberg has acknowledged the "distress" caused, after roughly 8,000 job cuts in May and 600 losses from Superintelligence Labs last October.

The defections started early. Within two months of the new lab launching last summer, at least eight researchers walked. Ethan Knight and Avi Verma each returned to OpenAI after less than a month. Chaya Nayak, a nine-year Meta veteran, left for OpenAI. Bert Maher, a twelve-year veteran, went to Anthropic. Meta's line is that "some attrition is normal".

It is a negative picture, but we've yet to see if Meta's lab can deliver. On the same day the cloud story broke, AI chief Alexandr Wang told staff that Meta's next model, codenamed Watermelon, has caught up with OpenAI's GPT-5.5 on key benchmarks and uses ten times the compute of Muse Spark. That is exactly the kind of workload that would fill the excess capacity Meta is now trying to rent out. When a user asked Wang about a Claude Opus-level coding model, he replied "pretty soon", promising users would like what Meta has cooking. The elite team behind Watermelon has been shielded from the layoffs and most of the exits.

Journalist M.G. Siegler argues the cloud move is the obvious fix for a problem Meta has always had. Unlike Amazon, Google and Microsoft, Meta has no way to monetise AI directly, only through ads. A cloud business solves that. He points to Google, once dismissed as a one-trick ads pony, whose cloud arm now runs at over $80 billion a year and is growing above 60%. Wall Street agrees with him, at least for now. Meta jumped 9% on the news, its best day in months, and Mizuho called it a "margin of safety", the thing that eases the biggest overhang on the stock.

Both the failure camp and the Siegler camp agree on the underlying fact: Meta and xAI built enormous compute without the internal demand or model sophistication to use it.

Meta is the one member of the Magnificent Seven without a native answer to how AI pays for itself. Nvidia sells the silicon. Amazon, Microsoft and Google rent compute through clouds that were already businesses. Google adds custom chips and consumer monetisation on top. Apple needs AI to sell its hardware. Meta has none of that. It builds like a hyperscaler, spends like OpenAI, and earns like an ad company, and those three identities do not yet cohere. Its 2026 capex guidance, nearly double last year's spend, is among the biggest proportional leaps in the group, with the least obvious payoff. The Mag 7 trade is splitting apart as AI spending separates winners from laggards, and Meta sits right on the fault line, too committed to be cautious like Apple, too short of direct monetisation to be safe like Google.

Takeaways: A Meta compute business would benefit almost everyone outside the current duopoly. The frontier today is effectively OpenAI and Anthropic, with Google losing ground fast, and that concentration sets the prices, the rate limits and the terms everyone else builds on. There is a natural ratio between compute spent training models and compute spent serving them, and Meta and xAI ended up stuck at the wrong end of it, pouring silicon into training models that then drew too little user demand to earn their keep on the other side. Renting the fleet out is how you rebalance the books when your own models cannot. A Meta that rents out that capacity, ships Watermelon at GPT-5.5 level, and reopens even part of its old open-weight instinct would put cheap compute and a fourth serious model into the market at once. That is more choice for developers, downward pressure on token prices, and less reliance on two labs. Meta's worst year as an AI company could turn out to be a good one for those looking for compute and choice.

DeepSeek does more with less

DeepSeek's open-sourced DSpark speeds up model serving by 57 to 85% with no loss of quality, and it works across model families. Efficiency, not raw capability, is becoming the most valuable work in AI.

Joel Miller

03 July 20263 min read

Models keep getting bigger, memory keeps getting scarcer, and compute is still in short supply. That combination has pushed one skill to the top of the market: running models for less. Last week DeepSeek open-sourced DSpark, and it's another example of the lab's ingenuity in the face of constrained resources.

LLMs generate text one token at a time, and each token forces the machine to reload the entire model from memory just to produce a single word. The hardware spends most of its time waiting on memory, not calculating. A stack of techniques has grown up to fix this. The "KV cache" stores past work so it isn't recomputed. "PagedAttention" packs that cache tightly so more requests fit at once. "Continuous batching" keeps the processor busy by swapping requests in and out as they arrive. Combined, they let a server handle several times more traffic than a naive setup.

"Speculative decoding" is the newest and most interesting layer. A small, fast model guesses the next batch of tokens, then the large model checks them all in a single pass and keeps the ones it agrees with. The answer is identical to normal generation, just produced faster. DSpark is DeepSeek's evolution of this idea, and its trick is twofold: a drafting model that scores the confidence of its own guesses, and a scheduler that tracks how busy the GPU is. It verifies long runs of guesses when there is spare capacity and prunes the low-confidence ones when the machine is saturated, which sidesteps the usual conflict between speculation and heavy batching. The reported gain is 57% to 85% faster generation per user at the same throughput.

DeepSeek has tested DSpark on Qwen and Gemma models as well as its own, so the technique works across model families. A hosting business can bolt it onto other open models like GLM-5.2, or better still train an optimised drafting model and lower its serving costs significantly.

DeepSeek keeps publishing methods that other labs might treat as trade secrets, and the logic is partly strategic. Efficiency reduces the need for the most advanced chips, which matters for a Chinese lab working under export controls. Open techniques also build an ecosystem that spreads faster than any single product could. When your rivals are constrained by hardware, making the software cheaper to run is a way to compete on the ground you can actually control.

For businesses and individuals, this is what turns self-hosting from an aspiration into a reality. As open models close the quality gap and efficiency work like DSpark cuts the cost of running them, capable AI on your own hardware becomes practical. The last obstacle is physical. Apple raised Mac and iPad prices by 15 to 25% last week, blaming a memory shortage driven by AI data centres buying up supply, and memory contract prices nearly doubled in the first quarter alone, with another 60% rise in the second. Memory is the bottleneck in the hardware you buy and in the serving stack alike.

Takeaways: The frontier of AI has moved from building smarter models to running them for less, and DeepSeek is handing that capability to the whole field. Bigger models and scarce, expensive memory would normally put advanced AI further out of reach, but efficiency work pulls it back within grasp. The thing standing between us and frontier models on local hardware is now mostly the price of memory, which is exactly why squeezing more from every chip has become the most valuable work in AI.

The AI boom compared

A BIS chart puts AI investment at 4.5 times its pre-boom level in three years, steeper than canal mania, railways, the Roaring 20s and dotcom. Those booms broke around year five. We are at year three.

Joel Miller

03 July 20262 min read

This week's chart, drawn by the Financial Times from Bank for International Settlements data, lines up today's AI spending against four famous booms: the canal mania of the 1830s, British railway mania, the electrification boom of the Roaring 20s, and the dotcom bubble. Each line starts at 1 and tracks investment as a multiple of its pre-boom low. AI has hit 4.5 times its starting point in just three years. Canal mania peaked at 4.1 over five years, railways at 2.7 over four, while electrification and dotcom both topped out near 1.9. AI is steeper than all of them, and it hasn't peaked. These historic booms typically broke around year five, then dragged their economies into recession. We are at year three.

The 1840s railway mania in Britain was one of the greatest technology manias in history, and by 1850 cumulative investment neared half of Britain's GDP. George Hudson, known as the "Railway King", the Sam Altman of his day, ran over 1,000 miles of line and paid 10% dividends, partly out of capital as it later emerged, to keep share prices climbing. The market turned in late 1845, and shares slid through the panic of 1847 to bottom out in 1850, roughly two-thirds below their peak. The investors caught in the fall included Charles Darwin, John Stuart Mill and the Brontë sisters. Yet the track stayed. Much of the network Britain runs today was laid in those manic years. The question for AI isn't whether the correction comes, but whether we are laying track that outlives it, or authorising lines that never get built.

News roundup

This week: OpenAI offers Washington equity while Sonnet 5 resets the mid tier, regulators and courts test AI accountability, research probes the limits of scale, and the chip race gains new challengers from Etched to Samsung.

AI business news

Anthropic releases Claude Sonnet 5, bringing near-Opus capability to the mid tier (Anthropic shipping near-frontier coding and agentic capability at Sonnet prices resets the cost-performance curve that every AI budget decision is being made against.)
Microsoft launches its own AI deployment company with $2.5 billion commitment (Microsoft launching a dedicated $2.5B AI deployment company signals that the real enterprise battleground is now implementation and integration, not model capability.)
Cloudflare’s new policy pushes AI companies to pay for publishers’ content (Cloudflare's new policy forcing AI companies to pay for publisher content could fundamentally reshape the economics of AI training data and web crawling at scale.)
SAP snaps wallet shut for travel and hiring so it can keep shoveling cash into AI (SAP freezing travel and hiring to fund AI investment illustrates how legacy enterprise software giants are cannibalizing their own operations to avoid being disrupted.)
Amazon’s Mechanical Turk to stop accepting new customers – and not even AI can save it (Amazon shutting Mechanical Turk to new customers marks a symbolic end of an era, the original human-powered data marketplace made obsolete by the AI systems it helped build.)

AI governance news

Sources: Anthropic moves to close loopholes that let Chinese companies like Ant use its models via workarounds such as cloud providers and overseas subsidiaries (Anthropic actively closing cloud-provider loopholes used by Chinese firms like Ant signals that AI access controls are shifting from government export rules to private company enforcement, a structural change in how AI containment actually works.)
OpenAI proposes handing Trump administration a 5% stake, FT reports (OpenAI offering the US government a 5% equity stake would be an unprecedented entanglement between a leading AI lab and federal power, and a striking answer to a month of government gatekeeping over frontier releases.)
Trump drops restrictions on Anthropic’s Mythos and Fable models (Trump lifting export restrictions on Anthropic's Mythos and Fable models reverses Biden-era controls and signals a broader administration posture of using AI model access as a geopolitical carrot rather than a stick.)
TikTok announce major redundancies amid push for AI content moderation (TikTok cutting 300 trust-and-safety jobs in Ireland to replace human moderators with AI is the clearest real-world test yet of whether AI content moderation can legally and practically substitute for human judgment at scale.)
Lawsuit accuses AI security company of publishing hallucinated findings (MeetingTV's lawsuit alleging that Palo Alto Networks published AI-hallucinated threat intelligence linking its infrastructure to Chinese hackers establishes a legal liability question the security industry has no framework for yet.)

AI research news

An AI agent for treatment reasoning over a biomedical tool universe (ATHENA-R1 is a concrete, named AI agent capable of treatment reasoning across all FDA-approved drugs, representing a meaningful step toward clinical decision support with verifiable scope.)
Bad company corrupts good morals: Understanding and Measuring Narrative-Induced Moral Reasoning Degradation in LLMs (This paper quantifies how prolonged narrative exposure degrades LLMs' moral reasoning and alignment stability, a direct safety concern for any deployment involving extended conversational context.)
The Complexity Ceiling Benchmark: A Multi-Domain Evaluation of Sequential Reasoning Under Depth Scaling (The Complexity Ceiling Benchmark isolates depth of sequential reasoning as a controlled variable, giving practitioners a rigorous tool to measure exactly where and how fast LLM reasoning breaks down.)
Reaching Trillion-Parameter Performance with a 35B Agent (The finding that a 35B agent can match trillion-parameter model performance challenges the assumption that capability scales with raw parameter count, with direct implications for inference cost.)
Multi-Resolution Flow Matching: Training-Free Diffusion Acceleration via Staged Sampling (A training-free method for accelerating diffusion models via staged sampling lowers the compute barrier for high-quality image and video generation without requiring any retraining.)

AI hardware news

Anthropic is discussing a new custom chip with Samsung (Anthropic entering custom silicon talks with Samsung signals that frontier AI labs can no longer afford to depend on third-party chips for their inference economics.)
Etched emerges from stealth with a working chip, $800M raised and over $1B in customer contracts (A startup arriving with a working inference chip, a billion dollars in signed contracts and a $5B valuation shows investors betting that purpose-built inference silicon can carve real share from Nvidia's general-purpose GPUs.)
Firmus to build 170,000-GPU AI factory campus with Nvidia in Indonesia (A 360MW Nvidia DSX campus in Batam with up to 170,000 accelerators and $25-30B in expected offtake marks a significant geographic shift in AI infrastructure investment toward Southeast Asia.)
DriveNets unveils high-capacity AI fabric platforms to connect thousands of XPUs (DriveNets' new Broadcom-powered 1.6T AI fabric platform addresses a bottleneck that GPU manufacturers rarely discuss: the network interconnect that determines whether thousands of accelerators can actually work together efficiently.)
TSMC announces 2026 capex spend of $56bn after posting eighth consecutive quarter of growth (TSMC's $56B capex commitment for 2026, paired with confirmation that 2nm and 1.4nm cost-per-wafer is substantially higher than prior nodes, gives the clearest picture yet of why AI infrastructure economics are structurally shifting upward.)

Subscribe to the ExoBrain Weekly Newsletter

Stay up to date with AI. Get analysis of the week's most important stories, plus a focused roundup across business, governance, research and infrastructure.