100 trillion tokens and the glass slipper effect

A massive empirical study of 100 trillion tokens, conducted by a16z and OpenRouter, offers the clearest picture yet of how the world actually uses AI. OpenRouter is an inference routing platform that connects developers to hundreds of different language models through a single API. Because it sits between users and providers, it captures anonymised sample data on which models are chosen, for what tasks, and how usage patterns change over time. The dataset spans billions of prompt-completion pairs from a global userbase.

It’s exactly one year today since OpenAI made the full version of o1 available. By late 2025, reasoning models account for over 50% of all traffic. This is a rapid migration toward AI systems that can manage task state, follow multi-step logic, and support agent-style workflows. The results in general confirm a move away from simple chatbots toward what the researchers call “agentic inference”. They also bear out the rise in open-source innovation driven by China, and introduce a surprising retention phenomenon dubbed the “glass slipper” effect.

This data suggests we are moving away from asking models to write poems or summarise emails and are asking them to interact with external tools, debug complex software, and manage multi-step workflows. Consequently, the shape of AI traffic has changed. Input prompts have quadrupled in length since early 2024, driven largely by programming tasks where developers dump entire codebases into the context window. Models are now deployed as components in larger automated loops rather than for single-turn chats.

A fascinating insight in the report is a retention theory the authors call the “glass slipper” effect. In a market flooded with new models every week, one might expect users to constantly switch to the newest option. The data shows the opposite for what the researchers term “foundational cohorts”. When a specific model finally solves a hard, specific workflow for a user, that user stays put. Early adopters of Gemini 2.5 Pro or Claude Sonnet 4 show remarkably flat retention curves. They found a model that worked for their specific architectural or coding problem and locked it into their infrastructure. Conversely, models that launch without being a “frontier” solution see their user base vanish almost immediately. There is no prize for second place in inference. You either solve a new class of problem, or you are ignored.

A year ago, the industry debated whether open models could ever catch up to proprietary giants. The data suggests they have not only caught up but captured specific markets. Open-weight models now represent roughly 30% of all token volume, up from around 1% in late 2024. This appears to be a stable equilibrium. Proprietary models like the Claude and GPT-5 families still dominate high-stakes business tasks where accuracy is paramount. Open-source models have cornered the market on high-volume, cost-sensitive work. As we have reported in recent months, the surge is largely fuelled by Chinese research. Models from DeepSeek and Qwen have normalised a rapid release cycle that keeps them competitive with Western heavyweights. In the programming category, Chinese open models briefly held the majority share of usage in mid-2025 before OpenAI’s GPT-OSS series responded. The competitive dynamic is healthy. No single open model now holds more than 20-25% of the open-source market, compared to the near-monopoly DeepSeek held in early 2025.

The research also documents a quiet extinction event among small models. Open models under 15 billion parameters have seen their share of usage collapse, even as the number of available small models continues to grow. This is likely because small models can now run locally on edge hardware. If you can run a capable 7B model on your laptop or phone, there is little reason to route that traffic through a cloud API. The action has shifted to “medium” models in the 15-70 billion parameter range, which balance reasoning capability with the speed required for agentic loops. These models are too large for most local hardware but efficient enough for real-time inference at scale.

The data also shows a clear segmentation in willingness to pay. Efficient, low-cost models like DeepSeek V3 are consumed in massive quantities for roleplay and drafting. Meanwhile, premium models like Claude Sonnet 4 and GPT-5 Pro command prices nearly 100 times higher, yet usage remains high. Enterprises are happy to pay $30+ per million tokens if it means the code works or the legal analysis is sound. Intelligence is not becoming a commodity. For mission-critical work, the market rejects “cheap and good enough” in favour of “expensive and correct”.

The report is packed with fascinating charts…

Takeaways: The early AI chatbot era is passing. More than half of all traffic now flows through reasoning-based models and agentic inference loops, and if your AI strategy relies on single-turn Q&A rather than multi-step workflows, you are in last year’s market. China has emerged as an AI superpower, with Chinese open-weight models driving nearly a third of global open-source usage and leading in coding and creative applications. Retention in this market is governed by what might be called “problem-model fit”. Users do not churn if a model solves a previously unsolvable friction, and the first model to crack a specific complex task captures a durable user base that is hard to dislodge. The landscape is bifurcating. Small models are migrating to local hardware, medium models are becoming the workhorse of cloud inference, and premium models are holding their pricing power for work that demands accuracy above all else.

100 trillion tokens and the glass slipper effect

DeepSeek pays less attention

Moonshot challenges the giants

Deep Research shows the way for agents

The ChatGPT moment for robotics is coming

Subscribe to the ExoBrain Weekly Newsletter