
This image reveals most clearly what we might call the ronnaFLOP difference. The ARC-AGI-2 leaderboard shows xAI’s Grok 4 achieving >15% accuracy, breaking through what researchers call the “noise barrier” at 10%. This benchmark tests fluid intelligence, whether AI can learn new skills from examples and apply them to novel problems. Whilst top models have struggled to exceed single digits (o3, Claude, Gemini etc.), Grok 4’s performance represents genuine progress, and will be closely compared to Gemini 3.0 Pro, Claude 4.1 and GPT-5 all purported to be waiting in the wings for summer launches.
