Moonshot challenges the giants

This week’s biggest open-weight model releases from Chinese lab Moonshoot looks to be one of the most significant this year. Kimi K2 Thinking is a trillion-parameter Mixture-of-Experts model that activates about 32B parameters per request and scores 44.9% on Humanity’s Last Exam and 60.2% on BrowseComp, edging past GPT-5 and Claude 4.5. The standout claim is stability across long tool chains: it can run roughly 200–300 sequential calls without falling apart. This looks like we’re seeing the convergence of closed and open models on the frontier and will have the big labs very worried.

Pricing will increase anxiety, with Kimi K2 Thinking on the Moonshot infrastructure coming in at $0.60 and $2.50 per million input and output tokens respectively, whilst GPT-5 sits at $1.25 / $10. That makes K2 Thinking roughly 8x cheaper on input and 4x cheaper on output.

Quantisation is the compression of the AI world and K2 leans on it heavily. It uses “INT4” weights trained with quantisation-aware methods and says it gets about 2x speed-ups while keeping quality close to higher-precision baselines. In plain terms, INT4 means the model stores numbers with 4 bits instead of the 16 or 32 bits. That makes the model smaller and faster to move around, but it throws away detail, which can show up as lost accuracy or brittleness in tricky cases. The K2’s training process works by simulating quantisation effects during fine-tuning, allowing weights to adapt and compensate for precision loss.

Some users have reported it can get stubborn, locking onto a view and refusing to explore alternatives, which hurts when the initial step is wrong. It also tends to assume facts to support its own argumentation, which is grating in creative tasks. On creativity, outputs are often thin and need heavy prompting. And local deployment remains tough. Even with INT4, guidance suggests around 512GB system RAM and 32GB VRAM as a floor for smooth use. A 1.8-bit variant around 245GB has been tested by some, but they still report needing 64GB RAM and an RTX 4090 for slow, basic runs. “Open” doesn’t help much if only a handful of labs can operate it.

Results were reported at INT4 precision, and researchers have raised contamination and comparability questions. The long-chain robustness is promising, but any precision drop can compound over many steps. That may dent success rates in messy, real-world workflows even if headline scores look strong. And yet there’s no denying that this is significant. As Clement Delangue co-founder of Hugging Face stated: “Kimi K2 Thinking feels like a big milestone for open-source AI. The first time in a while that open-source gets ahead of proprietary APIs on their big area of focus (agents).”

And there’s also the strategic angle; this is the continued Chinese software-first response to US chip controls. If access to top silicon is constrained, make the model leaner so it matters less. By pushing INT4-native training and publishing strong numbers, Moonshot is saying efficiency can beat raw compute. That challenges the “more GPUs, bigger models” reflex in the US labs and creates a path where compute-poor teams can still compete, at least on carefully tested tasks.

Takeaways: The aggressive quantisation strategy shows that accepting controlled quality loss can unlock cost savings perhaps without sacrificing extensive agentic tool use. Whilst the model has clear limitations in creativity and flexibility, its economics and long-chain stability suggest we’re entering an era where open offerings genuinely threaten the business models of closed AI labs, not through matching their scale but by making scale itself less relevant.

Moonshot challenges the giants

China takes the lead on open models

A new form of American power

Nvidia “not a car” but not untouchable

The early singularity runs in a loop

Subscribe to the ExoBrain Weekly Newsletter