ExoBrain
benchmarks and evalsinference economicsmodel releasesopen models

The bell curve of AI intelligence

A new benchmarking project aggregates public tests to show that leading US and Chinese models now cluster at similar intelligence levels, highlighting the importance of monitoring efficiency alongside capability.

ExoBrain

1 min read
The bell curve of AI intelligence

Our chart this week comes from aiiq.org, a project by Ryan Shea. The site aggregates seventeen public benchmarks across five reasoning dimensions, composites the results, and calibrates the output against the human IQ scale where 100 is the population average and each 15 points is one standard deviation.

The chart plots around seventy models on that scale. The leading cluster, GPT-5.5, Gemini 3.1 Pro, Gemini 3 Pro, Opus 4.6 and GPT-5.4, sits between IQ 130 and 135. The middle of the chart, between 100 and 125, holds most current models, including Chinese open-weight releases such as DeepSeek V4 Pro, Kimi K2.5 and Qwen 3.6 alongside Western entries like Gemma 4 31B and the GPT-OSS family. China and US clusters are now indistinguishable on this measure.

This is not an equivalent to human IQ, but it does show what one might expect: a predictable, normal distribution of model intelligence. What we have not yet seen, and what will be interesting to track, is what starts to populate the edges of this distribution. The site also publishes a cost per intelligence view, which is a useful frontier of efficiency to watch and a good companion to resources like Artificial Analysis for monitoring model progress.