ExoBrain

ExoBrain Weekly

Google's grand bazaar, the compute commodity, and is AI more expensive than space travel?

Welcome to our weekly newsletter, a combination of thematic insights from the founders at ExoBrain, and a broader news roundup from our Exo agents.

This week we look at:

  • Google's grand bazaar

    Google I/O 2026 launched Gemini 3.5 Flash, Omni, Spark and Antigravity 2.0 alongside dozens of other AI products. Google has nearly every asset to lead the next phase of AI, but still struggles to converge on a coherent product spine.

  • The compute commodity

    AI compute now behaves like a maturing commodity market: production costs are still falling, but memory bandwidth scarcity, reasoning verbosity and reliability tiers are reshaping how the frontier prices inference. Cost per outcome is replacing cost per token.

  • Is AI more expensive than space travel?

    SpaceX's record S-1 filing reveals an AI company underneath the rockets, with AI accounting for 93% of the claimed $28.5 trillion TAM and 76% of Q1 capex. The contract powering it all is cancellable on 90 days notice.

  • News roundup

    Leadership resets, AI job cuts, frontier-lab governance fights, multi-agent research and the next turn in AI compute supply.

Google's grand bazaar

Google I/O 2026 launched Gemini 3.5 Flash, Omni, Spark and Antigravity 2.0 alongside dozens of other AI products. Google has nearly every asset to lead the next phase of AI, but still struggles to converge on a coherent product spine.

Joel Miller

Joel Miller

4 min read
Google's grand bazaar

Google I/O 2026 was the biggest AI event of the week, and probably the fullest expression yet of Google's AI strategy. Gemini 3.5, AI Mode, Gemini Omni, Spark, Antigravity 2.0, Beam, Android XR glasses, new Workspace features, new creator tools, new silicon, new subscription tiers. Google has almost every asset it needs to lead the next phase of AI. But it's yet to show that it can turn those assets into products many people trust.

The main model news was Gemini 3.5 Flash, now the default model in the Gemini app and AI Mode in Search. We've been running Gemini 3.5 Flash in daily work since launch. It's fast, and the output quality holds up against pricier tiers for most everyday tasks.

Gemini Omni is the deeper play. Google describes it as a natively multimodal generative model that can take any combination of text, image, audio and video as input and produce coherent output. Video is the launch modality, with image and audio promised later. Sundar Pichai framed it as a step towards world models that can simulate physics, culture and causality, which is straight from the DeepMind playbook on spatial intelligence and embodied understanding.

We don't buy the full "any-to-any" branding yet. Output is video-only at launch, and Veo and Imagen still exist as parallel specialist systems. GPT-4o followed a similar pattern: the "o" stood for omni, but the full omni capability never quite arrived. The bet behind Omni is still the right one. If the next jump in machine intelligence comes from training on the physical world rather than scaling text alone, Omni is Google's production test of that thesis. Google is the company best placed to run it.

Gemini Spark may matter more than it looked on stage. It is a 24/7 background agent running on Google Cloud VMs, integrated with Workspace, and consumer services like Canva and Instacart. You can think of it as a simpler version of OpenClaw.

After those three topics, the event became harder to parse. Universal Cart promised cross-merchant checkout across Search, YouTube, Gemini and Gmail. Beam with Sophie put a lifelike video AI agent inside what used to be Project Starline. Search gained generative UI, a redesigned intelligent Search Box and more AI Mode surface area. Workspace added Gmail Live, Docs Live and Keep Live for voice-driven work. Pics arrived as a Nano Banana 2 powered editing surface. There was Daily Brief, Ask YouTube, Ask Maps, Flow for music and video creation, Googlebooks as a new Android laptop category, four Android XR glasses partnerships, TPU 8t and 8i silicon, and AI Ultra subscription tiers at $100 and $200.

Each announcement is defensible on its own. Together they gave the familiar Google feeling: extraordinary capability, too many fronts, and not enough evidence that the teams are converging on a single intuitive product spine.

The strongest consolidation was in developer tooling, where it matters most for AI coding. Antigravity 2.0 pulls together the original Antigravity, Gemini CLI and Jules onto a shared engine, with the new CLI and the app as the two surfaces. Strategically, that's the right move. Tactically, the migration damaged trust. We were Antigravity 1.0 users. The 2.0 update auto-installed, then failed to authenticate, and once that was fixed we found the IDE mode we relied on had been removed in favour of an agent-only experience. Google hurried an IDE option back in, and provided free tokens to users, but the damage was done.

Takeaways: Google I/O 2026 showed a company with almost everything required to lead the next phase of AI, and not yet enough discipline to make the whole thing feel coherent. Gemini 3.5 Flash proves Google can ship a fast, capable model at scale. Omni puts DeepMind's world-model thesis into production. Spark shows that Google understands agents change computing from sessions to standing instructions. Antigravity 2.0 shows that Google can consolidate when it chooses to, but the migration pain shows how easily it can lose user trust. The bazaar is open and the merchandise is real. The next test is whether Google can close a few stalls and make the best ones indispensable.

The compute commodity

AI compute now behaves like a maturing commodity market: production costs are still falling, but memory bandwidth scarcity, reasoning verbosity and reliability tiers are reshaping how the frontier prices inference. Cost per outcome is replacing cost per token.

Joel Miller

Joel Miller

3 min read

Google I/O was the main AI story this week. Our lead article focused on the product surface: Gemini 3.5, AI Mode, agents, video and Google's claim that it is now serving 3.2 quadrillion tokens a month. But Google released Gemini 3.5 Flash at $1.50 per million input tokens and $9 output, three times the price of the Flash model it replaces. Meanwhile Anthropic tightened Max plan restrictions and OpenAI launched Guaranteed Capacity, letting enterprises commit to one, two, or three-year reservations of inference compute in exchange for discounts. On the surface this suggests that the anticipated token price squeeze is happening. But it's more that we're seeing the mechanics of a commodity market.

In commodity markets, price starts with production. For AI, production starts with silicon, and that cost curve is still falling. NVIDIA's Blackwell generation lowered cost per million tokens roughly 35 times against Hopper. Epoch AI's analysis shows the price for any fixed capability milestone falling between 9 and 900 times per year over the last three years, with a median of 50. Artificial Analysis data shows the same pattern: for a given band of model intelligence, prices keep stepping down.

Cheaper production doesn't mean cheaper access. Google disclosed at I/O that token volume is now seven times last year's level. If demand rises faster than production efficiency, the clearing price moves to the bottleneck.

For inference, the fundamental bottleneck is the aggregate of deployed memory bandwidth. High-bandwidth memory is the binding constraint on throughput today. SK Hynix and Samsung dominate HBM3E and HBM4 supply, allocations are sold through 2026, and accelerator throughput per watt is gated by memory bandwidth rather than logic-die fabrication. Jonathan Ross, formerly of Groq, put it plainly at Sohn this week: until recently, no one was really trying to squeeze more performance out of memory chips, and now they all are.

That supply response has a lead time. In the meantime, OpenAI is signing 3 GW dedicated inference deals with NVIDIA, and Anthropic similar arrangements with SpaceX. OpenAI's Guaranteed Capacity is best read as a forward contract for inference. Buyers who can lock in multi-year reservations get supply security and a discount. Buyers on standard rates pay a higher effective price for the same delivered tokens, and accept more variance in availability. That's a capacity premium, not a production cost increase.

The unit being traded is also changing. Commodity markets need a unit of account. Tokens used to do the job well enough, because pre-reasoning models made them roughly comparable. A token from one model was not identical to a token from another, but the comparison was usable. Reasoning models break that. Gemini 3.5 Flash defaults to dynamic thinking and burns more tokens per delivered task. Anthropic's Opus 4.7 tokeniser maps the same content to between 1.0 and 1.35 times more billable units. Artificial Analysis benchmark runs cost more on Gemini 3.5 Flash at high effort than on the more expensive-looking Gemini 3.1 Pro.

The list price per token has not become more expensive in any clean sense. The token has become a smaller and more variable unit of work.

Reliability has been productised. OpenAI now sells Standard, Priority, Flex and Scale tiers, with Priority running at roughly 1.5 to 2 times Standard rates for SLA-backed throughput, and Flex offering half-price tokens for asynchronous workloads with possible queuing. Once capacity tightens, the single price splits into peak, off-peak, interruptible and reserved supply.

Underneath the frontier tier, the market has gone the other way. Cursor's Composer 2.5 lists at $0.50 input and $2.50 output per million tokens, with cost per task on coding workloads roughly a tenth of frontier alternatives at comparable quality. Composer is built on the open-weight Kimi K2.5 base, with most of the compute spent on Cursor's own post-training and editor integration. Open-weight models from Kimi, DeepSeek, and Qwen are abundant at the low end of the curve, and the price per useful unit of work in this segment is still falling fast.

The result is a bifurcated market. The frontier is in cost-push inflation driven by capacity scarcity, reasoning verbosity and reliability premiums. The open tier is in continued deflation driven by hardware gains, distillation, fine-tunable weights and better tooling. Two buyers in the same industry can experience opposite price trajectories depending on which segment they rely on.

Cost per token is becoming less informative because the token itself is no longer fungible across models, reasoning depths or tokenisers. Cost per outcome, measured against a defined unit of delivered work, is the metric that still holds up.

Takeaways: AI compute now behaves like a commodity market with falling production costs, a memory-bandwidth bottleneck, reliability premiums, reserved capacity and surplus supply underneath the frontier. Less than 1% of AI users are power-users today, and we're already in a compute shortage. Buyers who handle the next eighteen months well will treat AI compute as procurement, not SaaS subscription management. Audit the basket of models in use, measure cost per delivered outcome rather than per token, lock in capacity only where the work justifies the premium, and build model- and harness-agnostic routing across multiple execution lanes so that a tokeniser change, a quota tightening, or a reservation shortage at any single vendor doesn't rewrite the unit economics of the whole stack.

Is AI more expensive than space travel?

SpaceX's record S-1 filing reveals an AI company underneath the rockets, with AI accounting for 93% of the claimed $28.5 trillion TAM and 76% of Q1 capex. The contract powering it all is cancellable on 90 days notice.

Joel Miller

Joel Miller

2 min read

SpaceX's estimated TAM by segment, with AI representing the overwhelming majority of the claimed total addressable market.

This week's chart comes from SpaceX's S-1, the legal filing a company submits to US regulators ahead of an initial public offering. This is envisaged as the largest IPO of all time and it is not your conventional S-1. Alongside the audited accounts and risk factors sit phrases about extending consciousness to the stars and making life multiplanetary, wrapped around a claim to the largest actionable addressable market in human history at $28.5 trillion... enterprise AI.

But if you thought space travel and colonising the stars might be expensive, the eye-watering numbers are actually for the AI business. Despite the name, SpaceX is no longer just rockets and satellite internet. It is now, by its own framing, an AI company. AI accounts for 93% of the claimed TAM, 76% of capital expenditure in Q1 2026, and gets mentioned more than 200 times in the filing. Connectivity, the Starlink business, made money in Q1. Space, the rockets, lost $622 million. AI lost $2.47 billion on $818 million of revenue, with capex running at over $30 billion annualised.

The problem for SpaceX is what that GPU infrastructure is currently doing. Since the February merger folded xAI into the new SpaceXAI division, the Colossus build sits under the same roof as Grok. But instead of powering Grok, it is being rented to Anthropic on a $15 billion a year deal that the customer can cancel with 90 days notice. The largest AI bet in IPO history rests on a contract thinner than a Starlink subscription.

News roundup

Leadership resets, AI job cuts, frontier-lab governance fights, multi-agent research and the next turn in AI compute supply.

AI business news

AI governance news

AI research news

AI hardware news