ExoBrain

ExoBrain Weekly Newsletter

Meta's great retreat, DeepSeek does more with less, and the AI boom compared

Welcome to our weekly newsletter, a combination of thematic insights from the founders at ExoBrain, and a broader news roundup from our Exo agents.

This week we look at:

  • Meta's great retreat

    Meta plans to sell access to its AI compute and models after $140 billion of spending. The pivot reads as retreat, but cheap compute and a fourth serious model entering the market could benefit everyone outside the duopoly.

  • DeepSeek does more with less

    DeepSeek's open-sourced DSpark speeds up model serving by 57 to 85% with no loss of quality, and it works across model families. Efficiency, not raw capability, is becoming the most valuable work in AI.

  • The AI boom compared

    A BIS chart puts AI investment at 4.5 times its pre-boom level in three years, steeper than canal mania, railways, the Roaring 20s and dotcom. Those booms broke around year five. We are at year three.

  • News roundup

    This week: OpenAI offers Washington equity while Sonnet 5 resets the mid tier, regulators and courts test AI accountability, research probes the limits of scale, and the chip race gains new challengers from Etched to Samsung.

Meta's great retreat

Meta plans to sell access to its AI compute and models after $140 billion of spending. The pivot reads as retreat, but cheap compute and a fourth serious model entering the market could benefit everyone outside the duopoly.

Joel Miller

Joel Miller

4 min read
Meta's great retreat

Is this what losing looks like in the AI wars? You spend more than $140 billion over three years building the machinery to win, then start renting it to the very rivals you meant to beat. On Wednesday, Bloomberg reported that Meta is building a cloud business, under its internal Meta Compute initiative, to sell access to its AI computing power and models. The plan comes in two forms. Meta would let developers pay to use its models hosted on its own infrastructure, including the closed-weight Muse Spark, in the style of Amazon's Bedrock. And it would rent out raw compute capacity like a neocloud such as CoreWeave. In doing so it goes up against AWS, Azure and Google Cloud, the incumbents who actually run this business at a profit.

Meta is the one hyperscaler that never became a cloud provider, because it always consumed every chip it could buy to power its own products. Now it wants to sell that capacity to outsiders. When you build a fleet to train frontier models and then lease the fleet to others, you are telling the market you have more silicon than you have worthwhile internal solutions to run on it.

This is reminiscent of xAI. Rather than becoming a leading model developer, Elon Musk's outfit, now folded into SpaceX, rents its Colossus data centres to others. In May, Anthropic took more than 300 megawatts of that capacity, over 220,000 Nvidia GPUs. In June, Google agreed to pay SpaceX around $920 million a month for xAI compute through 2029. Yann LeCun, Meta's former chief AI scientist, has called xAI "kind of a failure", its founding team gone and compute rental the only way left to recoup the cost of the hardware. His old employer is now walking the same road.

The people problem tells the same story as the hardware. Inside Meta, morale in the 6,500-strong Applied AI unit has cratered. Wired described a staff livestream hijacked so someone could demand a senior executive be told he was "a piece of shit". TechCrunch summed up its engineers' verdict as a "soul-crushing gulag". More than 1,600 employees signed a petition against a scheme monitoring their keystrokes for training data. Zuckerberg has acknowledged the "distress" caused, after roughly 8,000 job cuts in May and 600 losses from Superintelligence Labs last October.

The defections started early. Within two months of the new lab launching last summer, at least eight researchers walked. Ethan Knight and Avi Verma each returned to OpenAI after less than a month. Chaya Nayak, a nine-year Meta veteran, left for OpenAI. Bert Maher, a twelve-year veteran, went to Anthropic. Meta's line is that "some attrition is normal".

It is a negative picture, but we've yet to see if Meta's lab can deliver. On the same day the cloud story broke, AI chief Alexandr Wang told staff that Meta's next model, codenamed Watermelon, has caught up with OpenAI's GPT-5.5 on key benchmarks and uses ten times the compute of Muse Spark. That is exactly the kind of workload that would fill the excess capacity Meta is now trying to rent out. When a user asked Wang about a Claude Opus-level coding model, he replied "pretty soon", promising users would like what Meta has cooking. The elite team behind Watermelon has been shielded from the layoffs and most of the exits.

Journalist M.G. Siegler argues the cloud move is the obvious fix for a problem Meta has always had. Unlike Amazon, Google and Microsoft, Meta has no way to monetise AI directly, only through ads. A cloud business solves that. He points to Google, once dismissed as a one-trick ads pony, whose cloud arm now runs at over $80 billion a year and is growing above 60%. Wall Street agrees with him, at least for now. Meta jumped 9% on the news, its best day in months, and Mizuho called it a "margin of safety", the thing that eases the biggest overhang on the stock.

Both the failure camp and the Siegler camp agree on the underlying fact: Meta and xAI built enormous compute without the internal demand or model sophistication to use it.

Meta is the one member of the Magnificent Seven without a native answer to how AI pays for itself. Nvidia sells the silicon. Amazon, Microsoft and Google rent compute through clouds that were already businesses. Google adds custom chips and consumer monetisation on top. Apple needs AI to sell its hardware. Meta has none of that. It builds like a hyperscaler, spends like OpenAI, and earns like an ad company, and those three identities do not yet cohere. Its 2026 capex guidance, nearly double last year's spend, is among the biggest proportional leaps in the group, with the least obvious payoff. The Mag 7 trade is splitting apart as AI spending separates winners from laggards, and Meta sits right on the fault line, too committed to be cautious like Apple, too short of direct monetisation to be safe like Google.

Takeaways: A Meta compute business would benefit almost everyone outside the current duopoly. The frontier today is effectively OpenAI and Anthropic, with Google losing ground fast, and that concentration sets the prices, the rate limits and the terms everyone else builds on. There is a natural ratio between compute spent training models and compute spent serving them, and Meta and xAI ended up stuck at the wrong end of it, pouring silicon into training models that then drew too little user demand to earn their keep on the other side. Renting the fleet out is how you rebalance the books when your own models cannot. A Meta that rents out that capacity, ships Watermelon at GPT-5.5 level, and reopens even part of its old open-weight instinct would put cheap compute and a fourth serious model into the market at once. That is more choice for developers, downward pressure on token prices, and less reliance on two labs. Meta's worst year as an AI company could turn out to be a good one for those looking for compute and choice.

DeepSeek does more with less

DeepSeek's open-sourced DSpark speeds up model serving by 57 to 85% with no loss of quality, and it works across model families. Efficiency, not raw capability, is becoming the most valuable work in AI.

Joel Miller

Joel Miller

3 min read
DeepSeek does more with less

Models keep getting bigger, memory keeps getting scarcer, and compute is still in short supply. That combination has pushed one skill to the top of the market: running models for less. Last week DeepSeek open-sourced DSpark, and it's another example of the lab's ingenuity in the face of constrained resources.

LLMs generate text one token at a time, and each token forces the machine to reload the entire model from memory just to produce a single word. The hardware spends most of its time waiting on memory, not calculating. A stack of techniques has grown up to fix this. The "KV cache" stores past work so it isn't recomputed. "PagedAttention" packs that cache tightly so more requests fit at once. "Continuous batching" keeps the processor busy by swapping requests in and out as they arrive. Combined, they let a server handle several times more traffic than a naive setup.

"Speculative decoding" is the newest and most interesting layer. A small, fast model guesses the next batch of tokens, then the large model checks them all in a single pass and keeps the ones it agrees with. The answer is identical to normal generation, just produced faster. DSpark is DeepSeek's evolution of this idea, and its trick is twofold: a drafting model that scores the confidence of its own guesses, and a scheduler that tracks how busy the GPU is. It verifies long runs of guesses when there is spare capacity and prunes the low-confidence ones when the machine is saturated, which sidesteps the usual conflict between speculation and heavy batching. The reported gain is 57% to 85% faster generation per user at the same throughput.

DeepSeek has tested DSpark on Qwen and Gemma models as well as its own, so the technique works across model families. A hosting business can bolt it onto other open models like GLM-5.2, or better still train an optimised drafting model and lower its serving costs significantly.

DeepSeek keeps publishing methods that other labs might treat as trade secrets, and the logic is partly strategic. Efficiency reduces the need for the most advanced chips, which matters for a Chinese lab working under export controls. Open techniques also build an ecosystem that spreads faster than any single product could. When your rivals are constrained by hardware, making the software cheaper to run is a way to compete on the ground you can actually control.

For businesses and individuals, this is what turns self-hosting from an aspiration into a reality. As open models close the quality gap and efficiency work like DSpark cuts the cost of running them, capable AI on your own hardware becomes practical. The last obstacle is physical. Apple raised Mac and iPad prices by 15 to 25% last week, blaming a memory shortage driven by AI data centres buying up supply, and memory contract prices nearly doubled in the first quarter alone, with another 60% rise in the second. Memory is the bottleneck in the hardware you buy and in the serving stack alike.

Takeaways: The frontier of AI has moved from building smarter models to running them for less, and DeepSeek is handing that capability to the whole field. Bigger models and scarce, expensive memory would normally put advanced AI further out of reach, but efficiency work pulls it back within grasp. The thing standing between us and frontier models on local hardware is now mostly the price of memory, which is exactly why squeezing more from every chip has become the most valuable work in AI.

The AI boom compared

A BIS chart puts AI investment at 4.5 times its pre-boom level in three years, steeper than canal mania, railways, the Roaring 20s and dotcom. Those booms broke around year five. We are at year three.

Joel Miller

Joel Miller

2 min read

This week's chart, drawn by the Financial Times from Bank for International Settlements data, lines up today's AI spending against four famous booms: the canal mania of the 1830s, British railway mania, the electrification boom of the Roaring 20s, and the dotcom bubble. Each line starts at 1 and tracks investment as a multiple of its pre-boom low. AI has hit 4.5 times its starting point in just three years. Canal mania peaked at 4.1 over five years, railways at 2.7 over four, while electrification and dotcom both topped out near 1.9. AI is steeper than all of them, and it hasn't peaked. These historic booms typically broke around year five, then dragged their economies into recession. We are at year three.

The 1840s railway mania in Britain was one of the greatest technology manias in history, and by 1850 cumulative investment neared half of Britain's GDP. George Hudson, known as the "Railway King", the Sam Altman of his day, ran over 1,000 miles of line and paid 10% dividends, partly out of capital as it later emerged, to keep share prices climbing. The market turned in late 1845, and shares slid through the panic of 1847 to bottom out in 1850, roughly two-thirds below their peak. The investors caught in the fall included Charles Darwin, John Stuart Mill and the Brontë sisters. Yet the track stayed. Much of the network Britain runs today was laid in those manic years. The question for AI isn't whether the correction comes, but whether we are laying track that outlives it, or authorising lines that never get built.

News roundup

This week: OpenAI offers Washington equity while Sonnet 5 resets the mid tier, regulators and courts test AI accountability, research probes the limits of scale, and the chip race gains new challengers from Etched to Samsung.

AI business news

AI governance news

AI research news

AI hardware news

Subscribe to the ExoBrain Weekly Newsletter

Stay up to date with AI. Get analysis of the week's most important stories, plus a focused roundup across business, governance, research and infrastructure.

Follow us on LinkedIn