ExoBrain
AI safetybenchmarks and evalsresearch and science

The $1 million ARC prize

François Chollet has launched a $1 million prize for the ARC challenge to evaluate AI reasoning capabilities beyond the pattern matching of current large language models.

Joel Miller

Joel Miller

2 min read
The $1 million ARC prize

French AI researcher François Chollet, known for his critical stance on Large Language Models (LLMs), has recently launched the Abstraction and Reasoning Corpus (ARC) challenge in collaboration with Mike Knoop, co-founder of Zapier. This $1 million prize presents another way to evaluate AIs and encourage the development of systems that can efficiently learn new skills, rather than just the answering of questions (see our coverage of standardised testing for AI 2-weeks ago).

Chollet argues that current LLMs rely heavily on memorisation and pattern matching, lacking the ability to adapt to novel situations. He believes that the industry’s excessive focus on LLMs has set back progress towards AGI by 5-10 years, limiting the exchange of ideas and collaboration among researchers. The hard-cash prize aims to encourage researchers to explore new ideas and approaches. The prize will be awarded annually, with the ultimate goal of achieving 85% accuracy on the benchmark.

The ARC benchmark, designed by Chollet, tests an AI system’s ability to efficiently learn new skills by presenting it with novel tasks that require reasoning and abstraction. He believes LLMs solve tasks by identifying the right ‘program template’ from their vast memory and applying it. Chollet maintains that true reasoning involves working out new programs on the fly.

Jack Cole, a researcher working on solutions to the ARC benchmark, has been able to make progress using LLMs, getting scores of around 35%. His approach combines the strengths of current AIs, which do excel at pattern recognition, with better solution ‘search’, supporting more thoughtful planning and reasoning as well as more dynamic learning. LLMs currently suffer from being unable to deeply learn as they think.

However, some argue that these discussions over memorisation versus new program synthesis may be irrelevant if AI systems can effectively get things done. After all, there is no precise definition of intelligence, as highlighted by the work of Michael Levin on diverse intelligence in biological systems. It may be that us humans are also drawing from a vast array of mini program templates as we solve day-to-day problems.

Takeaways: The ARC prize will have very little impact on the current multi-billion dollar LLM focus, but it does serve as an important indicator for AI progress. If researchers can win, it would suggest that AI capabilities may prove the sceptics wrong and develop faster than even the scaling-law optimists believe. This could have profound implications for the coming years, so it will be worth watching this space very closely. We at ExoBrain will keep you posted.