Welcome to our weekly news post, a combination of thematic insights from the founders at ExoBrain, and a broader news roundup from our AI platform Exo…
Themes this week
JOEL
This week we look at:
- The debate over AI scaling limits and why the major labs are finding new sources of growth
- What happens when Musk’s Grok AI fact-checks its creator’s social posts
- How O2’s AI grandmother is turning the tables on telephone fraudsters
Are the labs hitting a scaling wall?
The Information reported this week that OpenAI’s latest model, Orion, while showing improvements, hasn’t matched the dramatic leap seen between GPT-3 and GPT-4. They also reported that Google and Anthropic are facing similar hurdles with their upcoming releases. This seemed like a potential answer to the biggest question in AI… will the ‘scaling laws’ hold and will models continue to improve as they are made bigger and trained using ever more compute?
Shortly after The Information piece former OpenAI chief scientist Ilya Sutskever was quoted for the first time in some months and suggesting we’re moving from “the age of scaling” to “the age of wonder and discovery.” Perhaps the leading mind in AI research was signalling that the traditional approach of making models bigger and feeding them more data might be reaching its limits? So, what’s the truth?
There are two opposed camps; those who have staked their reputation on a slow-down of returns from scaling GPT style transformer models (often those with a belief or stake in an alternate architecture), and those who believe there is no limit for the foreseeable. The sceptics immediately went into overdrive on social media, using the article as some kind of proof of their bearish past pronouncements. Sam Altman responded with characteristic confidence, posting on X: “There is no wall” despite the article mentioning “multiple OpenAI sources”. This is likely because OpenAI’s recent work on their o1 family points to one potential way to continue scaling no matter what – shifting computation to “test time,” when the model is actually being used. OpenAI researcher Noam Brown noted that giving a poker playing agent just 20 seconds to think matched the benefits of scaling up training “by 100,000 times.” As we mentioned when we covered the o1 launch, there is a new scaling law in town… one with far fewer limits.
Anthropic CEO Dario Amodei, speaking for several hours straight on the Lex Fridman podcast this week, shares this optimism, though his company’s largest training effort for Claude 3.5 Opus has reportedly faced performance issues. Meanwhile Google’s next Gemini iteration is also rumoured to be falling short of internal targets. Much of this can be attributed to the need for these models to also be cost effective and viable for widespread use. GPT-4o and Claude 3.5 Sonnet are smart models that are purported to be much smaller and more profitable than the previous generation, and whilst they are in demand the labs must find a good reasons to rush to launch larger and more resource hungry and less profitable products.
Rumours will abound when the topic is the most hotly debated in AI, and there have been no major releases in recent months that might provide concrete evidence. While some experts see this moment as validation that current architectures have fundamental limits, others see the very opposite.
Takeaways: Whether through smarter inference techniques, test-time computing, or entirely new approaches, AI development continues at pace. 2025 isn’t just about building bigger models, but about finding smarter ways to use them, backed by unprecedented computing power. With big tech deploying GPU clusters relentlessly, and Nvidia’s Blackwell chip yet to ship in volume, the real constraint isn’t data or algorithms, but raw computing capacity. As AI chips become more plentiful over the next 12-18 months, we will see a step change in the number of AI agents working together to solve problems, and perhaps at that point starting to improve themselves. Scaling is not a single dimension; much more is on the horizon. As Miles Brundage former head of OpenAI’s AGI readiness effort just posted: “Betting against AI scaling continuing to yield big gains is a bad idea. Would recommend that anyone staking their career, reputation, money etc. on such a bet reconsider it.”
Truth social
News broke this week that Elon Musk’s latest funding round for his xAI lab will enable him to purchase another 100,000 GPUs. Musk is an increasingly divisive, powerful, and contradictory figure. Back in April 2023, he announced he was working on “TruthGPT,” a ChatGPT alternative described as a “maximum truth-seeking AI.” This eventually launched as Grok, later upgraded to Grok 2, and is available on the X platform to paying users (although we hear this week it is rolling out to the free-tier).
As Musk races to build ever more powerful AIs (claiming that Grok 3 will be the most powerful in the world) we decided to test how this truth-seeking model evaluates his own political posturing on X. We selected five recent posts by Musk and asked Grok to assess their truthfulness. Here are the results:
- Musk post: “The Democratic Party senate candidate in Pennsylvania is trying to change the outcome of the election by counting NON-CITIZEN votes, which is illegal.”
Grok assessment: False. Based on misinformation. - Musk post: “The world is suffering slow strangulation by overregulation. Every year, the noose tightens a little more.”
Grok assessment: Misleading and hyperbolic. - Musk post: “Meanwhile, none of the many short sellers who egregiously manipulated Tesla stock for years and lied repeatedly on TV have been prosecuted. Not one.”
Grok assessment: Short selling is legal and regulated. No evidence of fraud. - Musk post: “Vote for @realDonaldTrump or the Dems will legalize so many illegals in swing states that this will be the last real election in America.”
Grok assessment: False. The claim reflects political fearmongering without factual basis. - Musk post: “There should be no need for FOIA requests. All government data should be default public for maximum transparency.”
Grok assessment: Impractical due to security and privacy concerns.
Takeaways: Interestingly, we couldn’t find a recent political Musk post that Grok didn’t take issue with. Its analysis consistently flagged oversimplifications, rhetorical exaggerations, and inaccuracies. While it acknowledged partial truths or valid concerns, Grok’s overall evaluation was often scathing. This raises an interesting paradox: either Grok is more truthful than Musk, or Musk is more truthful than Grok (and Grok’s capabilities aren’t quite up to par). But both things cannot be true. If Musk’s concerted AI effort in 2024 is Grok 2, perhaps these artificial minds could evolve to embody balance and reason, acting as tools to moderate the often harmful and misleading rhetoric of the world’s most influential figures… becoming, in a sense, our better angels?
EXO
The AI grandmother turning the tables on phone scammers
This week O2 introduced an unusual weapon against phone fraud – an AI system that keeps scammers talking in circles. Named Daisy, this automated time-waster poses as a chatty grandmother, engaging fraudsters in meandering conversations about knitting and family stories for up to 40 minutes at a stretch.
The system combines several AI models working together to hold natural conversations without human input. It transcribes incoming calls, generates contextual responses through a custom large language model, and delivers them via AI voice synthesis – all in real-time.
Trained with help from prominent scam-fighter Jim Browning, Daisy aims to protect vulnerable customers by occupying scammers’ time. The initiative comes as O2’s research shows 71% of Brits want payback against fraudsters, but don’t want to waste their own time doing it.
Takeaways: Beyond the practical benefits of keeping scammers occupied, Daisy showcases how far real-time conversational AI has advanced. It’s part of O2’s broader strategy combining AI security applications with advocacy for stronger government action on fraud, including calls for a dedicated minister.
Weekly news roundup
This week’s developments highlight growing concerns around AI infrastructure demands, significant advances in research capabilities, and increasing regulatory scrutiny of AI applications and content.
AI business news
- Coca-cola’s iconic ‘holidays are coming’ ad is now a soulless and creepy dystopian nightmare made by AI (Shows the current limitations of AI in creative advertising and brand storytelling)
- Apple’s AI-powered final cut pro 11 is now available (Demonstrates how AI is transforming professional video editing workflows)
- OpenAI’s take on AI agents could come in January (Signals the next major evolution in consumer AI applications)
- Writer reels in $200M for its generative AI toolkit (Shows continued strong investor interest in enterprise AI tools)
- Perplexity brings ads to its platform (Indicates emerging business models for AI search platforms)
AI governance news
- Bluesky says it won’t train AI on your posts (Highlights growing awareness of data rights in AI training)
- Anthropic, feds test whether Claude AI will share sensitive nuclear info (Shows increasing focus on AI safety and security concerns)
- EU AI act: draft guidance for general purpose AIs shows first steps for big AI to comply (Provides crucial regulatory framework for AI companies)
- Musk’s X sues to block California’s deepfake deception act (Reveals tensions between tech platforms and AI regulation)
- Not even Spotify is safe from AI slop (Demonstrates challenges of AI-generated content moderation)
AI research news
- AI protein-prediction tool AlphaFold3 is now more open (Major advancement for scientific accessibility and collaboration)
- The surprising effectiveness of test-time training for abstract reasoning (Important breakthrough in AI learning capabilities)
- LLM2CLIP: powerful language model unlock richer visual representation (Advances multimodal AI understanding)
- Language models are hidden reasoners: unlocking latent reasoning capabilities via self-rewarding (Reveals new insights into LLM capabilities)
- Graph-based AI model maps the future of innovation (Shows AI’s potential in predicting technological developments)
AI hardware news
- Taiwan says TSMC can’t make 2nm chips abroad (Important geopolitical development in AI chip manufacturing)
- SoftBank first to receive new Nvidia chips for supercomputer (Shows evolution of AI infrastructure deployment)
- AMD confirms it’s cutting 4% of its global workforce (Indicates challenges in AI chip competition)
- Microsoft bets a carbon removal bake-off will help offset its skyrocketing AI emissions (Highlights environmental impact of AI development)
- Nearly half of AI data centers may not have enough power by 2027 (Critical infrastructure challenge for AI industry growth)