benchmarks and evalsfrontier labsmodel releases

Breaking the noise barrier

xAI's Grok 4 breaks the 'noise barrier' on the ARC-AGI-2 benchmark, demonstrating significant progress in fluid intelligence compared to other leading models.

ExoBrain

11 July 20251 min read

This image reveals most clearly what we might call the ronnaFLOP difference. The ARC-AGI-2 leaderboard shows xAI’s Grok 4 achieving >15% accuracy, breaking through what researchers call the “noise barrier” at 10%. This benchmark tests fluid intelligence, whether AI can learn new skills from examples and apply them to novel problems. Whilst top models have struggled to exceed single digits (o3, Claude, Gemini etc.), Grok 4’s performance represents genuine progress, and will be closely compared to Gemini 3.0 Pro, Claude 4.1 and GPT-5 all purported to be waiting in the wings for summer launches.

Subscribe to the ExoBrain Weekly Newsletter

Stay up to date with AI. Get analysis of the week's most important stories, plus a focused roundup across business, governance, research and infrastructure.

Breaking the noise barrier

The new rhythm of AI progress

ARC-AGI-2 falls to Gemini Deep Think

Models learn when they’re being tested

Clash of the AI titans

Subscribe to the ExoBrain Weekly Newsletter