The next wave begins

Q1 will see a slew of rumoured and highly anticipated ‘next generation’ models and agents emerge from the labs and big tech firms which will define the frontier capabilities for 2025.

xAI: Elon Musk’s AI outfit that put itself on the map in 2024 with a rapid super cluster build out:

Grok 3: Rumours suggest that this model, trained on up to 100,000 Nvidia H100 GPUs, will surpass all currently available models in terms of pre-training compute. Grok could be available as soon as 1-2 weeks from now, but without reinforcement learning techniques and reasoning capabilities, we’re expecting it to be strong but not game-changing.

OpenAI: After a busy 12 days of shipping before Christmas several anticipated products are still to be made available to the world:

Operator: Operator is designed to go beyond the scheduling or reminders seen in the new ChatGPT tasks feature. It aims to be a full-blown AI agent capable of performing complex tasks autonomously, such as scheduling meetings, writing documents, and automating multi-step processes like booking flights. Planned for release in January 2025.
o3 full: Unveiled with benchmarks that were described as ground breaking, o3 set new records in reasoning, coding, and mathematical problem-solving, including an 87% score on the ARC-AGI benchmark. It’s slated for release shortly after o3-mini, likely in February or March, following safety testing and research access. Cost will be key, we can extrapolate that it likely to be a very pricey at around $30 per million input tokens and $120 for a million output tokens!
o3-mini: The smaller model in the family, it’s designed to offer impressive performance while being adaptable with different reasoning effort levels. Public release is planned for February.
OpenAI GPT-4.5/5/Orion: There has been no official announcement from OpenAI regarding the release of either GPT-4.5 or GPT-5. Reports and posts on X suggest that “Orion” is the internal codename for what might be publicly released as GPT-5. The Wall Street Journal and other tech publications have reported that the model is behind schedule due to technical challenges, high computing costs, and a scarcity of high-quality training data. But others have recently suggested that Orion/GPT-5 may be being used to feed high quality training data to the o-series reasoning models and supporting their rapid progress.

Google: After a big splash, Gemini 2.0’s roll-out in 2025 has been somewhat confused, with multiple experimental versions making it into the AI studio and elsewhere. 1.5 Pro, Flash, Pro with Deep Research, 2.0 Flash Experimental Preview and 2.0 Experimental Advanced Preview are all available in the Gemini web app? But what of the full family:

Gemini 2.0 Pro / Pro Thinking: A more advanced version is now supposedly running Google Notebook LM research tool but has yet to be seen in the wider Google ecosystem. We can surely expect to see the fully functional 2.0 Pro model in the next few weeks, perhaps even enabled with the reasoning (thinking) mode that may rival or exceed OpenAI’s o1.
Mariner, Astra, and Jules: These are Google’s agent offerings that were trailed in late 2024, showcasing the company’s push towards more advanced AI interactions with Project Mariner focusing on web navigation, Project Astra aiming to be a universal AI assistant, and Jules assisting with coding tasks. These products are likely several months away, although their release may be accelerated if OpenAI’s Operator offers significant capabilities.

Anthropic: They have been quiet of late but have tended to drop major new releases with little fanfare:

Anthropic Claude 3.5 Opus: The release of Opus 3.5 by Anthropic has been subject to much speculation and delay. Initially, there were hints and teasers suggesting a release before the end of 2024, but it didn’t appear, although CEO Dario Amodei confirmed ongoing work, but remained vague about when it would be available. The most recent indications are that it was completed but ended up being used as a “teacher” model in a process akin to model distillation for training Sonnet 3.5, which ended up being the stand-out model in 2024.
Anthropic Claude 4.0 Series: With perhaps models with more reasoning to come potentially for within the 3.5 range, it’s hard to guess when 4.0 will be available. Some suggest Q2, but Anthropic’s biggest headache is keeping up with demand, with Cursor the AI powered software engineering tool posting this week that they were easily the biggest Claude customer but were hitting capacity limits and having to restrict access. Amazon still seem to be behind the curve in deploying GPU infrastructure to match their competitors.

Meta: While Meta align with Trump’s vision of America, they continue for now to support the frontier of open weight models:

Llama 4.0: Meta had aimed to have as many as 600,000 GPUs running by the end of 2024. Mark Zuckerberg confirmed during a Meta earnings call that Llama 4 is well into its training, with an anticipated release in early 2025. There are indications that Llama 4 will see multiple releases throughout the year, suggesting a strategy of iterative improvements similar to what we’ve seen with previous Llama versions. Llama 4 is expected to introduce or enhance multi-modal capabilities and advanced reasoning.

Takeaways: The AI landscape in early 2025 is set to be transformed by a wave of innovative model releases. These new products will define the state of the art and potential of AI in the first half of 2025. The successful models will need to balance three key elements: enhanced reasoning abilities, practical usefulness and instruction following, and infrastructure that can scale with demand.

The next wave begins

Agents code all day long

Deep Research shows the way for agents

Google's grand bazaar

New models Spud and Mythos leaked

Subscribe to the ExoBrain Weekly Newsletter