Week 32 news

Welcome to our weekly news post, a combination of thematic insights from the founders at ExoBrain, and a broader news roundup from our AI platform Exo…

Themes this week

JOEL

This week we look at:

  • The recent tech stock turbulence and its implications for AI investments.
  • METR’s innovative approach to benchmarking AI capabilities against human experts.
  • Decoding OpenAI’s mysterious strawberry-themed teasers.

Market rollercoaster rocks tech stocks

In a week of market turmoil, tech stocks faced a significant downturn, sparking further debates about the stability of the AI boom. The catalyst? A confluence of events, from Japan’s interest rate hike to disappointing US jobs data, set against a backdrop of ongoing economic uncertainty.

Let’s summarise the sequence of events:

  • Japan raised interest rates to 0.25% last week
  • Rumours first emerged of a design flaw in Nvidia’s new Blackwell chip that could result in a 3-month delay in shipments
  • The OECD’s leading indicator signalled a weaker US economic outlook
  • Former tech giant Intel reported a $1.6 billion Q2 loss and planning 15% workforce reduction (despite a recent injection of federal cash from the CHIPS act)
  • Disappointing US jobs data heightened economic concerns
  • The US Fed failed to cut interest rates
  • Of the back of the BOJ’s decision the yen surged, forcing investors who borrowed yen for tech investments to sell
  • Japanese stocks suffered their biggest daily loss since 1987
  • Widespread selling driven by these factors created a snowball effect
  • Investors and algo trades rotated out of tech into ‘safer’ sectors
  • Falling big-tech stocks, where 30% of the S&P is concentrated (some $14 trillion) dragged down indices, triggering further cross-sector selling
  • As of Friday, US jobs numbers improved slightly, and stocks had re-bounded somewhat with the tech sector at +14% YTD

This sequence highlights how global economic factors, fiscal decisions, and market dynamics interacted to create a significant market event. Today tech stocks are a huge deal and conversely, any concerns about economic growth will downgrade outlooks for the tech platforms that power every industry. Plus, to a degree the tech correction also reflects the longer-term trend of struggling SaaS firms and the reduced VC deal flows seen in 2024 (somewhat at odds with the default narrative of excessive AI investment).

But is this all of this merely a short-term economic correction or a sign of deeper issues in the AI landscape? As we explored back in July, lazy comparisons to the dot-com bubble and AI ignorance abound. Andrew Odlyzko, a professor of mathematics at the University of Minnesota and expert on economic bubbles, argues that the AI situation more closely mirrors the early days of electricity – a fundamental shift in how we harness and apply technology.

At ExoBrain we believe that what we’re witnessing with AI isn’t a market cycle, but a fundamental reorganisation of how compute power is applied and valued in the economy. Market turbulence will likely continue as a result of the new, unpredictable and recursive evolution of multiple new technologies, which will drive what we’ll look back on as ‘creative destruction’.

There will be failures. Intel has been caught out by this new wave and has made the wrong calls. Nvidia may have dropped the ball somewhat rushing to ship their next generation chip. The pure-play AI companies like OpenAI and Anthropic face intense competition, uncertainty on profit mechanisms, and the cost of pushing the frontier forward remains huge while the next generation of more efficient AI chips are not yet deployed (and now delayed).

Yet, amidst this uncertainty, the tech giants remain well-positioned to weather the storm. Their diversified business models, cash (and data) reserves, and essential role in global business infrastructure provide a buffer against short-term volatility. Amazon, Google, Microsoft, and Meta have all signalled their intention to continue heavy investment in infrastructure. They are scaling up their capital expenditures significantly, with a focus on servers and datacentres to support growing cloud demand across the board. While some are developing their own custom AI chips, Nvidia’s GPUs remain a critical component of their infrastructure. The reported Blackwell delay is likely to be a temporary setback in the broader context of sustained demand for high-performance compute.

Takeaways: For investors and business leaders, the critical challenge is to differentiate between market blips and broader economic shifts (and the simplistic theories that people and traders latch onto), and the long-term transformative potential of AI. While caution is advisable, abandoning AI investments at this juncture would be short sighted. As we consistently highlight in this newsletter, we are merely at the dawn of a compute-driven revolution. Now is not the time to blink first.

The holy grail of benchmarks

This week, METR provided an update on their novel approach to evaluating AI capabilities, pitting machine performance against human experts across a diverse range of tasks. This new methodology aims to provide an improved understanding of AI progress, moving beyond abstract benchmarks to assess real-world impact.

The AI landscape is dominated by leader boards and benchmark scores that often fail to translate into meaningful insights about AI’s practical capabilities (as we covered in a previous newsletter). METR’s novel approach addresses this gap by directly comparing AI agent performance to that of human experts on a variety of complicated technical tasks, from cybersecurity to machine learning.

At the heart of METR’s evaluation is a focus on the correlation between human and agent performance. Their findings reveal that while AI agents generally excel at tasks that humans can complete quickly, models like Claude 3.5 and GPT-4o struggle to complete tasks that take human experts hours to solve. This view offers a clearer picture of where AI stands in relation to human capabilities and is essentially the holy grail of AI development in 2024 and the road to AGI… ‘longer horizon’ tasks with all of their dependencies, complexities and need for planning and reasoning.

This chart compares the performance of AI agents to human experts across tasks of varying difficulty. The x-axis represents how long it takes humans to complete different tasks, ranging from 1-4 minutes to 16-64 hours. The y-axis shows the fraction of tasks completed by AI agents, averaged across six different language models:

This is a great new evaluation method. For businesses, a customised version of this could offer a more reliable way to assess where different AIs can most effectively augment or potentially replace human labour. The study’s revelation that AI agents can generally complete tasks at 1/30th the cost of human experts is particularly noteworthy, suggesting significant potential for cost savings and efficiency gains in certain areas. However, the research also highlights the room for improvement in AI’s ability to tackle complex, long-form tasks. It will be fascinating and fundamental to see how the next generation of models perform here, such as GPT-5, Llama 4 and Claude 3.5 Opus. Will they push the line out into long task territory, and what will this mean for job displacement? Or will the line move up, increasing quality but not moving the quantum of automation to a new level?

Takeaways: METR’s new evaluation approach is a great way to better understand where AI capabilities currently sit; many short and medium sized tasks are in scope and in many cases vastly cheaper to complete with AI. Longer and more complex tasks are the next frontier.

ExoBrain symbol

EXO

Open AI’s forbidden fruit

Sam Altman, CEO of OpenAI, stirred up speculation this week with a social media post featuring a photo of strawberries growing in planters. The caption, “I love summer in the garden,” sent the AI community into a frenzy of interpretation, with many believing it to be a veiled reference to a rumoured new AI planning algorithm codenamed ‘Strawberry’.

In July, Reuters reported on a secret OpenAI project called ‘Strawberry’, aimed at enhancing AI reasoning and enabling autonomous internet navigation. The timing of Altman’s post, amidst growing competition in the AI space, has only fuelled the fire of speculation.

Adding to the intrigue is a mysterious X account, “I rule the world MO“, which has been posting cryptic messages seemingly related to OpenAI’s developments. The account’s posting patterns are unusually frequent and responsive, leading some to speculate whether it might be an AI-powered account testing advanced language models. When this account posted “Welcome to level 2. How do you feel?”, Altman himself responded within hours with “Amazing to be honest,” further stoking curiosity.

The LMSYS chatbot arena once again featured in the mystery as an anonymous new model called “sus column R”, demonstrated impressive reasoning abilities and sparked speculation about its origins and whether it might be related to OpenAI’s rumoured ‘Strawberry’ project.

The peculiarity of the situation isn’t lost on industry observers. Ethan Mollick, a professor at the Wharton School, noted, “OpenAI is the only company whose corporate communication strategy consists of obscure hints, Delphic pronouncements and riddles that could fit into adventure games.” This approach, while unorthodox, has proven effective in generating buzz and maintaining OpenAI’s mystique in the fast-paced world of AI development. The hype has even led to some extreme displays of enthusiasm, with one member of the AI community reportedly getting a strawberry tattoo in anticipation of the rumoured new model.

In a strange twist of fate, amidst all this speculation about advanced AI capabilities, it has come to light that most leading AI models struggle with a seemingly simple task: correctly counting the number of “r” letters in the word “strawberry”. This quirky limitation serves as a reminder that even as we speculate about ground breaking AI advancements, current models still struggle with some basic tasks.

Takeaways: Despite the social media frenzy, it’s worth noting that as of yet, nothing concrete has materialised from this flurry of activity. While OpenAI’s teasers have certainly captured attention, the company might need to hurry up and ship actual products. Competitors like Meta, Google, and Anthropic are making significant strides in AI development, with some arguably ahead of OpenAI on several fronts. The AI race is heating up, and cryptic hints can only sustain interest for so long without tangible results.

Weekly news roundup

This week’s news highlights the expanding influence of AI across various sectors, ongoing regulatory scrutiny, advancements in AI research, and the competitive landscape in AI hardware development.

AI business news

AI governance news

AI research news

AI hardware news

Week 42 news

Image generation group test, feral meme-generators from the future, and AI goes nuclear

Week 40 news

OpenAI accelerates, auto-podcasting emerges from uncanny valley, and why not to take every economist’s view on AI at face value

Week 39 news

AlphaChip plays the optimisation game, money talks as content walks, and infrastructure goes hyperscale