ExoBrain
benchmarks and evalsfrontier labsmodel releases

Breaking the noise barrier

xAI's Grok 4 breaks the 'noise barrier' on the ARC-AGI-2 benchmark, demonstrating significant progress in fluid intelligence compared to other leading models.

ExoBrain

1 min read
Breaking the noise barrier

This image reveals most clearly what we might call the ronnaFLOP difference. The ARC-AGI-2 leaderboard shows xAI’s Grok 4 achieving >15% accuracy, breaking through what researchers call the “noise barrier” at 10%. This benchmark tests fluid intelligence, whether AI can learn new skills from examples and apply them to novel problems. Whilst top models have struggled to exceed single digits (o3, Claude, Gemini etc.), Grok 4’s performance represents genuine progress, and will be closely compared to Gemini 3.0 Pro, Claude 4.1 and GPT-5 all purported to be waiting in the wings for summer launches.