The recipe behind Mythos

For several months in late 2025 and early 2026, somewhere between 2% and 4% of the world’s entire AI compute capacity was pointed at a single training run. One lab assembled an estimated 150,000 of Nvidia’s latest Blackwell GPUs (equivalent to 600,000 of last generation’s H100s) and as we cover in the lead story this week, produced a model so capable it couldn’t be released. Here’s how.

Start with the silicon. Most leading models you use today, from GPT-5.4 to Claude Opus, were pre-trained on older Nvidia Hopper chips with 80 GB of fast memory per GPU. It’s not widely understood that until very recently most of the models we use are still based on older pre-trains. Mythos is the first frontier model where the entire pipeline, pre-training, post-training and alignment, was built end-to-end on Blackwell GB200 superchips. Each GPU packs 192 GB of HBM3e memory (2.4x more), connected via NVLink with double the bandwidth, in purpose-built NVL72 racks of 72 GPUs designed for exactly this kind of work. That memory density is what lets you keep a rumoured 10 trillion parameter model’s working set in fast storage rather than constantly shuffling data around. The result: low training instability across trillions of tokens.

Then add the data. The public web corpus is effectively exhausted, so Anthropic blended proprietary datasets with massive synthetic data generated by prior Claude models, a self-improvement loop that every frontier lab is now running. Layer on the largest post-training push we’ve seen: extensive reinforcement learning for code, reasoning and likely many examples of long-running agentic work from us busily using Claude Code, followed by automated Constitutional AI alignment that trains the model to internalise ethical principles rather than just follow rules. The recipe reportedly performed roughly twice as well as Anthropic’s own scaling laws predicted. Something qualitatively different happened at this scale.

And Anthropic won’t be alone for long. xAI is scaling Colossus 2 toward a million GPUs with plans for 10 trillion parameter models of its own. OpenAI’s next frontier model, codenamed Spud, completed pre-training in late March at the Stargate facility in Abilene, Texas using over 100,000 GPUs. It’s rumoured to be GPT-5.5 or GPT-6, natively multimodal, and is expected within weeks. OpenAI killed Sora and redirected compute to get it done. Runs of 100,000 to 500,000 GPUs are fast becoming table stakes, and Nvidia’s next generation Vera Rubin chips ship later this year promising another 3 to 5x leap.

Takeaways: The AI capability frontier doesn’t advance gradually. It moves in hardware generations. Mythos is the first proof that Blackwell-class compute creates a visible discontinuity in model quality, and every major lab is now racing to finish its own Blackwell-era training runs. The window before those models arrive is roughly 6 months. As we suggest in the lead story, prepare accordingly.

The recipe behind Mythos

Are the labs hitting a scaling wall?

The compute commodity

Compute crunch 2.0 arrives

Will AI run out of gas?

Subscribe to the ExoBrain Weekly Newsletter