DeepSeek pays less attention

The whale is back. Released on Monday, DeepSeek V3.2 is an open frontier (maths Olympiad gold level) model that introduces DeepSeek Sparse Attention (DSA), an architecture that changes how the model processes long sequences. While V3.1-Terminus costs scale linearly with token position, V3.2 stays nearly flat. At maximum context length, V3.2 costs roughly 70%-90% less!

Looking for efficiencies in the same area as Qwen’s Gated Attention, DeepSeek’s DSA takes a complementary approach: using a “lightning indexer” to identify and attend only to the most relevant tokens, reducing computational complexity. Both methods recognise that standard attention is wasteful; both solve it by making the model selective about what it focuses on.

DeepSeek is pricing V3.2 API access at a fraction of the cost of GPT-5. Efficiency in attention is the latest weapon in the AI arms race.

DeepSeek pays less attention

Meta’s Eco Llama

A wave of new model announcements

The bell curve of AI intelligence

Compute crunch 2.0 arrives

Subscribe to the ExoBrain Weekly Newsletter