The cognitive core model

Andrej Karpathy has an uncanny ability to articulate what the AI community is thinking before they fully realise it themselves. His latest observation about the “cognitive core” captures an ongoing debate around the nature of reasoning and knowledge.

Google just released Gemma 3n, which looks like another capable open-weight model. It achieves high LM Arena scores whilst running on limited hardware which is impressive. The model is natively multi-modal and handles text, images, audio and video, yet fits on a smartphone. The model is of a kind that could orchestrate a more diffuse system of intelligence.

Instead of building ever-larger models that memorise the internet, many labs are creating smaller models that maximise reasoning capability over encyclopaedic knowledge. These aren’t just the scaled down LLMs of recent years (SLMs), but ultra-efficient reasoning cores. As Karpathy puts it, They don’t know William the Conqueror died in 1087, but they can look it up when needed.

This mirrors how our brains actually work. The prefrontal cortex, our biological cognitive core, doesn’t store facts. It orchestrates reasoning, pulling information from memory systems as needed. It’s small relative to the whole brain but handles all our novel problem-solving and abstract thinking. The approach is already proving itself in robotics. RoboBrain 2.0 demonstrates what happens when you prioritise fluid intelligence over raw knowledge. The system handles multi-agent planning, spatial reasoning and real-time adaptation, all running locally on robots that can’t carry server racks on their backs. These models trade breadth for depth. They employ what François Chollet calls fluid intelligence: the ability to reason through novel situations rather than recall similar ones from training.

A cognitive core on every device means AI that works offline, preserves privacy, and responds instantly. More importantly, it suggests the path to more capable AI isn’t through brute-force scaling but through a purer form of intelligence.

Takeaways: The race for massive AI models may be missing the point. Gemma 3n and similar “cognitive core” models show that small, reasoning-focused architectures can match or exceed larger models’ practical capabilities whilst running on minimal hardware. As embodied AI and edge computing become essential, expect these lean thinking machines to unlock new architectures, model routing, and multi-model systems.

The cognitive core model

When agents talk to agents

Llamas now graze on the open frontier

AI wants to be free (of charge)

The bell curve of AI intelligence

Subscribe to the ExoBrain Weekly Newsletter