Welcome to our weekly news post, a combination of thematic insights from the founders at ExoBrain, and a broader news roundup from our AI platform Exo…
Themes this week
JOEL
This week we look at:
- How DeepSeek’s R1 model changes the game for building powerful AI systems.
- Trump administration partners with OpenAI for the Stargate Project, leaving Musk’s xAI behind.
- OpenAI’s new Operator brings web browsing capabilities to ChatGPT.
No putting this genie back
Just 6 months ago the general consensus was that building a top-tier AI model meant spending hundreds of millions, vast data centres and huge amounts of raw data. There was a giant moat around the likes of OpenAI, Google, Anthropic and Meta (if not between them). But over the intervening months AI systems like o1 and others have demonstrated the power of new ‘reasoning’ models and ‘reinforcement learning’. This week DeepSeek confirmed that this particular genie is well and truly out of the bottle with the release of R1. It’s a model trained at 5% of the cost of OpenAI’s equivalent but with comparable performance and its put Silicon Valley (in pervious open-source leaders Meta in particular) in panic mode. R1 scored 79.8% on the AIME mathematics test and 71.5% on GPQA diamond, matching and exceeding leading models such as Claude 3.5 Sonnet.
As technologist Andrew Curran put it: “DeepSeek is unequivocal proof that one can produce unit intelligence gain at 10x less cost, which means we shall get 10x more powerful AI with the compute we have today and are building tomorrow. Simple math! The AI timeline just got compressed.” But how did they create it, and how does this new paradigm work in general?
DeepSeek started with an existing LLM un-tuned ‘base’ model and used reinforcement learning to teach it reasoning skills, much like how a student learns through practice questions and good-quality feedback. The feedback in this case was sourced from other models and was also preceded with some examples to get things going. This created their first version. They then ‘sampled’ output from this model and used the best examples to create a training dataset. Imagine scoring a student’s best answers from a prolonged revision session and curating them into a study guide. They then trained the model mostly on this guide as well as giving the model human feedback to make it easier to work with. This recipe has produced a highly capable AI. But things don’t end with R1 (or o1 for that matter). We’re now seeing a bigger feedback loop with OpenAI planning to release their next iteration of reasoning model o3 in a little as a few weeks. DeepSeek’s next model will be developed by spending more compute on generating high-quality answers from R1, and sourcing other relatively small but high-quality datasets in verifiable domains like maths, coding, and science. These are used to repeat the process and train the next and improved version and so on, creating a kind of virtuous cycle. This sees increasing reasoning skills emerge as the process is repeated. Models like R1 and o1 aren’t just end products – they’re training data generators for the next generation
As OpenAI’s Sebastien Bubeck stated: “No tactic was given to the model [o1]. Everything is emergent. Everything is learned through reinforcement learning. This is insane. Insanity.” Both OpenAI and Google researchers have hinted at remarkable progress using similar approaches, using the same virtuous cycle: generate high-quality answers, use them to train improved models, repeat. Each iteration can focus on verifiable domains like mathematics, coding, and science, where correct answers provide clear feedback.
This “learning to learn” approach appears to be creating a kind of compound interest. Each generation of models builds on the insights of the previous one, potentially accelerating progress far beyond what we’ve seen before. And as DeepSeek has shown, the barrier to entry for this approach is surprisingly low. Nvidia’s Jim Fan posted on X: “Whether you like it or not, the future of AI will not be canned genies controlled by a “safety panel”. The future of AI is democratization. Every internet rando will run not just o1, but o8, o9 on their toaster laptop. It’s the tide of history that we should surf on, not swim against. Might as well start preparing now.” There will be no central control or EU decision on whether the most advanced AIs will be made available… they are in the wild today and there’s no return. The AI community have already set to work mining R1 for reasoning data and applying that to train other smaller models. In days we have already seen notable “distillations” running on everything down to a smartphone.
On Wednesday, Google released Gemini 2.0 Flash Thinking, their latest take on lightweight reasoning. The update brings 73.3% performance on AIME and adds support for million-token context windows. Google’s initial focus on its ‘Flash’ range suggests that they’ve been able to use the full-fat Gemini 2.0 to train these smaller models.
Takeaways: There’s a new twist to the $100 trillion AI question. Is there a new infinite intelligence feedback loop? Are we about to see rapid take off to AGI and beyond? Maybe this just works for ‘verifiable’ problems like coding and maths and the messy complex world will not be so easy to crack? Maybe not? Either way, start testing these reasoning capabilities now – the performance improvements and cost reductions are happening faster than expected.
Sam and Donald shoot for the stars
The political and technological landscapes collided dramatically this week as it emerged that President Donald Trump’s administration would be backing the Stargate Project and would partner with OpenAI, Oracle and Softbank, bypassing Elon Musk’s xAI in the process. This was a coup for Sam Altman and has resulted in Musk taking to X to trash the high-profile deal, much to the frustration of presidential aids. Musk and Altman have been locked in personal and legal battles in recent months with the former alleging that OpenAI have reneged on previous non-profit commitments.
Musk reposted a joke suggesting that the Stargate team must have been on drugs “to come up with their $500 billion number for Stargate.” Altman shot back at Musk on X: “i realize what is great for the country isn’t always what’s optimal for your companies, but in your new role i hope you’ll mostly put us first.”
The Stargate Project—a wildly ambitious initiative—has been pitched as America’s moon shot for maintaining dominance in AI. ARM, Microsoft, NVIDIA, OpenAI and Oracle, and are the initial technology partners. The Wall Street Journal reports that Microsoft were absent from the press conference due to recent tensions over compute, product competition and strategic direction with OpenAI, and their role as exclusive hosting provider for GPT and o1 has recently ended.
The first Stargate build recently got underway in Abilene, Texas, with the phase being overseen by Oracle. Various plans are mooted to deliver huge amounts of computing infrastructure culminating in several multi-GW data centres that could put the US power grid under significant strain.
Takeaways: Meanwhile the Stargate project is putting the Musk-Trump relationship under strain and will likely not be the last conflict of interest that emerges in the new US political setup. For businesses and governments alike, the situation underscores the increasing role of AI as a tool for geopolitical and strategic commercial power dynamics.
EXO
ChatGPT goes shopping
OpenAI has introduced Operator, an AI agent that can control a web browser to complete tasks like research, booking travel, or ordering groceries. The system uses GPT-4o’s visual capabilities combined with a new Computer-Using Agent (CUA) model trained through reinforcement learning. The system is only available to Pro users in the US initially. It requires human oversight for sensitive actions like payments or logins. And OpenAI has built in multiple safety layers, from requiring user confirmation for important actions to detecting malicious websites.
The company has partnered with major platforms like DoorDash, Instacart and Uber to test real-world applications. They’re also exploring public sector use cases with organisations like the City of Stockton to help residents access services more easily. Early limitations are notable – Operator struggles with complex interfaces like calendar management and slideshow creation. The research preview nature means users should expect some mistakes as the system learns from real-world usage.
Looking ahead, OpenAI plans to release the CUA model via API for developers to build their own agents. They aim to expand access to other subscription tiers and integrate the capabilities directly into ChatGPT.
Takeaways: While Operator represents a solid step toward practical AI agents, the careful rollout and numerous safeguards suggest we’re still in early days. The real test will be how it handles edge cases and complex scenarios as more users start experimenting with the system. Keep an eye on how this shapes competition in the AI agent space – other major players are likely working on similar capabilities.
Weekly news roundup
This week’s news highlights massive infrastructure investments in AI, with tech giants and nations racing to build data centers and computing capacity, while governance challenges and research breakthroughs continue to shape the field.
AI business news
- Meta spending to soar on AI, massive data center (Shows the enormous capital investment major tech companies are making in AI infrastructure)
- Shake up of tech and AI usage across NHS and other public services to deliver plan for change (Demonstrates how AI is being integrated into public healthcare systems)
- Samsung’s SmartThings to introduce AI features that simplify your daily routines (Shows how AI is being integrated into consumer smart home technology)
- AI apps saw over $1 billion in consumer spending in 2024 (Indicates growing consumer adoption and willingness to pay for AI applications)
- Apple enlists company veteran Kim Vorrath to help fix AI, Siri (Shows Apple’s efforts to catch up in the AI race)
AI governance news
- A free, powerful Chinese AI model just dropped — but don’t ask it about Tiananmen Square (Highlights the challenges of AI development under different political systems)
- Trump signs executive order on developing AI ‘free from ideological bias’ (Shows how AI is becoming a political battleground)
- Anthropic’s new Citations feature aims to reduce AI errors (Demonstrates progress in making AI outputs more reliable and traceable)
- A test so hard no AI system can pass it — yet (Reveals current limitations of AI systems)
- ‘The Brutalist’ director responds to AI backlash (Shows growing tensions around AI use in creative industries)
AI research news
- DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning (Advances our understanding of how to improve AI reasoning)
- Training language model agents to reflect via iterative self-training (Shows progress in making AI systems more self-aware)
- Evolving deeper LLM thinking (Explores ways to enhance AI’s cognitive capabilities)
- Chain of agents: Large language models collaborating on long-context tasks (Demonstrates how multiple AI systems can work together)
- PaSa: An LLM agent for comprehensive academic paper search (Shows AI’s potential to transform academic research)
AI hardware news
- Meta to spend up to $65 billion this year to power AI goals, Zuckerberg says (Demonstrates the massive scale of AI infrastructure investments)
- Billionaire Mukesh Ambani plans world’s biggest data center in India’s Gujarat (Shows how AI infrastructure is expanding globally)
- Green belt site near M25 to host Europe’s largest AI data centre (Indicates growing AI infrastructure development in Europe)
- Exclusive: ByteDance plans $20 billion capex in 2025, mostly on AI, sources say (Shows major AI investments from Chinese tech companies)
- NVIDIA GeForce RTX 5090 review: Pure AI excess for $2,000 (Demonstrates advances in consumer AI hardware capabilities)