Week 46 news

Welcome to our weekly news post, a combination of thematic insights from the founders at ExoBrain, and a broader news roundup from our AI platform Exo…

Themes this week

JOEL

This week we look at:

  • The debate over AI scaling limits and why the major labs are finding new sources of growth
  • What happens when Musk’s Grok AI fact-checks its creator’s social posts
  • How O2’s AI grandmother is turning the tables on telephone fraudsters

Are the labs hitting a scaling wall?

The Information reported this week that OpenAI’s latest model, Orion, while showing improvements, hasn’t matched the dramatic leap seen between GPT-3 and GPT-4. They also reported that Google and Anthropic are facing similar hurdles with their upcoming releases. This seemed like a potential answer to the biggest question in AI… will the ‘scaling laws’ hold and will models continue to improve as they are made bigger and trained using ever more compute?

Shortly after The Information piece former OpenAI chief scientist Ilya Sutskever was quoted for the first time in some months and suggesting we’re moving from “the age of scaling” to “the age of wonder and discovery.” Perhaps the leading mind in AI research was signalling that the traditional approach of making models bigger and feeding them more data might be reaching its limits? So, what’s the truth?

There are two opposed camps; those who have staked their reputation on a slow-down of returns from scaling GPT style transformer models (often those with a belief or stake in an alternate architecture), and those who believe there is no limit for the foreseeable. The sceptics immediately went into overdrive on social media, using the article as some kind of proof of their bearish past pronouncements. Sam Altman responded with characteristic confidence, posting on X: “There is no wall” despite the article mentioning “multiple OpenAI sources”. This is likely because OpenAI’s recent work on their o1 family points to one potential way to continue scaling no matter what – shifting computation to “test time,” when the model is actually being used. OpenAI researcher Noam Brown noted that giving a poker playing agent just 20 seconds to think matched the benefits of scaling up training “by 100,000 times.” As we mentioned when we covered the o1 launch, there is a new scaling law in town… one with far fewer limits.

Anthropic CEO Dario Amodei, speaking for several hours straight on the Lex Fridman podcast this week, shares this optimism, though his company’s largest training effort for Claude 3.5 Opus has reportedly faced performance issues. Meanwhile Google’s next Gemini iteration is also rumoured to be falling short of internal targets. Much of this can be attributed to the need for these models to also be cost effective and viable for widespread use. GPT-4o and Claude 3.5 Sonnet are smart models that are purported to be much smaller and more profitable than the previous generation, and whilst they are in demand the labs must find a good reasons to rush to launch larger and more resource hungry and less profitable products.

Rumours will abound when the topic is the most hotly debated in AI, and there have been no major releases in recent months that might provide concrete evidence. While some experts see this moment as validation that current architectures have fundamental limits, others see the very opposite.

Takeaways: Whether through smarter inference techniques, test-time computing, or entirely new approaches, AI development continues at pace. 2025 isn’t just about building bigger models, but about finding smarter ways to use them, backed by unprecedented computing power. With big tech deploying GPU clusters relentlessly, and Nvidia’s Blackwell chip yet to ship in volume, the real constraint isn’t data or algorithms, but raw computing capacity. As AI chips become more plentiful over the next 12-18 months, we will see a step change in the number of AI agents working together to solve problems, and perhaps at that point starting to improve themselves. Scaling is not a single dimension; much more is on the horizon. As Miles Brundage former head of OpenAI’s AGI readiness effort just posted: “Betting against AI scaling continuing to yield big gains is a bad idea. Would recommend that anyone staking their career, reputation, money etc. on such a bet reconsider it.”

Truth social

News broke this week that Elon Musk’s latest funding round for his xAI lab will enable him to purchase another 100,000 GPUs. Musk is an increasingly divisive, powerful, and contradictory figure. Back in April 2023, he announced he was working on “TruthGPT,” a ChatGPT alternative described as a “maximum truth-seeking AI.” This eventually launched as Grok, later upgraded to Grok 2, and is available on the X platform to paying users (although we hear this week it is rolling out to the free-tier).

As Musk races to build ever more powerful AIs (claiming that Grok 3 will be the most powerful in the world) we decided to test how this truth-seeking model evaluates his own political posturing on X. We selected five recent posts by Musk and asked Grok to assess their truthfulness. Here are the results:

  • Musk post: “The Democratic Party senate candidate in Pennsylvania is trying to change the outcome of the election by counting NON-CITIZEN votes, which is illegal.”
    Grok assessment: False. Based on misinformation.
  • Musk post: “The world is suffering slow strangulation by overregulation. Every year, the noose tightens a little more.”
    Grok assessment: Misleading and hyperbolic.
  • Musk post: “Meanwhile, none of the many short sellers who egregiously manipulated Tesla stock for years and lied repeatedly on TV have been prosecuted. Not one.”
    Grok assessment: Short selling is legal and regulated. No evidence of fraud.
  • Musk post: “Vote for @realDonaldTrump or the Dems will legalize so many illegals in swing states that this will be the last real election in America.”
    Grok assessment: False. The claim reflects political fearmongering without factual basis.
  • Musk post: “There should be no need for FOIA requests. All government data should be default public for maximum transparency.”
    Grok assessment: Impractical due to security and privacy concerns.

Takeaways: Interestingly, we couldn’t find a recent political Musk post that Grok didn’t take issue with. Its analysis consistently flagged oversimplifications, rhetorical exaggerations, and inaccuracies. While it acknowledged partial truths or valid concerns, Grok’s overall evaluation was often scathing. This raises an interesting paradox: either Grok is more truthful than Musk, or Musk is more truthful than Grok (and Grok’s capabilities aren’t quite up to par). But both things cannot be true. If Musk’s concerted AI effort in 2024 is Grok 2, perhaps these artificial minds could evolve to embody balance and reason, acting as tools to moderate the often harmful and misleading rhetoric of the world’s most influential figures… becoming, in a sense, our better angels?

ExoBrain symbol

EXO

The AI grandmother turning the tables on phone scammers

This week O2 introduced an unusual weapon against phone fraud – an AI system that keeps scammers talking in circles. Named Daisy, this automated time-waster poses as a chatty grandmother, engaging fraudsters in meandering conversations about knitting and family stories for up to 40 minutes at a stretch.

The system combines several AI models working together to hold natural conversations without human input. It transcribes incoming calls, generates contextual responses through a custom large language model, and delivers them via AI voice synthesis – all in real-time.

Trained with help from prominent scam-fighter Jim Browning, Daisy aims to protect vulnerable customers by occupying scammers’ time. The initiative comes as O2’s research shows 71% of Brits want payback against fraudsters, but don’t want to waste their own time doing it.

Takeaways: Beyond the practical benefits of keeping scammers occupied, Daisy showcases how far real-time conversational AI has advanced. It’s part of O2’s broader strategy combining AI security applications with advocacy for stronger government action on fraud, including calls for a dedicated minister.

Weekly news roundup

This week’s developments highlight growing concerns around AI infrastructure demands, significant advances in research capabilities, and increasing regulatory scrutiny of AI applications and content.

AI business news

AI governance news

AI research news

AI hardware news

Week 50 news

Gemini through the looking glass, Devin joins the team, and AI on the frontlines of healthcare

Week 49 news

On the first day of Christmas, a new AI czar, and Meta’s eco Llama

Week 48 news

Anthropic installs new plumbing for AI, the world’s first agent hacking game, and Sora testers go rogue

Week 47 news

DeepSeek’s deep thought, building a billion-agent workforce, and AI’s productivity puzzle