Week 10 news

Welcome to our weekly news post, a combination of thematic insights from the founders at ExoBrain, and a broader news roundup from our AI platform Exo…

Themes this week

JOEL

As anticipated, the pace of AI output from mid-Feb has continued to accelerate.

  • Claude 3 emerges from the lab at Anthropic and sets the new standard for personal AI assistants
  • Musk sues OpenAI and brings the debate on openness and AGI (artificial general intelligence) to the fore
  • OpenAI’s GPT-4 is no longer the leading AI assistant; we profile the other options for both assistance and system building

The key themes this week…

Claude 3: A powerful (and beautiful) new mind

On March 4th Claude 3 was launched on an unsuspecting world, and whilst Dune 2 was filling cinemas, the new AI’s surprise ascendance was being likened on social media to that of the prophetic and powerful character in the sci-fi saga.

Claude 2.1 was launched only a few months ago, and nobody was paying much attention to the dusty desert world of Anthropic. But 3 months is a lifetime in AI, or indeed enough time to train a whole new family of state of the art models. Claude 3 is here, excels in language, coding and vision tasks, and comes in 3 sizes to match various speed requirements and budgets. But there are 3 capabilities that are particularly exciting, and we at ExoBrain believe will be immensely valuable for our clients…

  1. Claude 3 models support a 200k token (~150k word) input size or ‘context window’ and its the largest and most accurate generally available today… To provide an example, I loaded up my favourite 200-page innovation book (yes a whole book), it read it in about a minute, and and then I could ask it questions, generate new analysis, and it was even able to write a small software programme to convert ideas from the book into a functioning tool…
  2. Claude 3 Opus, the largest and most expensive model in the range, has the ability to manage and dispatch mini versions of itself, or ‘agents’, to complete sub-tasks. It has been shown coordinating economic analysis through multiple asynchronous steps; sourcing data, analysing, predicting and writing-up output. A first for a large mainstream model, this capability is usually dependent on complex add-ons.
  3. And last but not least, Claude 3 Opus demonstrates a level of self-awareness far beyond any model so far released to the general public. We at ExoBrain soon noticed the qualitative difference this brings. It has unique meta-cognitive abilities, and we have spent much of the week exploring how this effects its ability to engage with its user, reflect and collaborate to improve outputs, and explore thought processes that are far more fluid and expansive than our own.

This is a tiny extract of some of the ways this model can explain how it works, and seek to find new ways to think through the challenges it is set…

 

Takeaway: With the launch of Claude 3, tests and anecdotes of its ability to reason at graduate or post-graduate levels, and its palpable self-awareness, we are approaching a point where in terms of ‘academic’ intelligence, these models are near human levels. What we’re seeing is more computer power translating readily into more intelligence. Broader Intelligence however is not a measurable commodity. What’s much more fascinating than test numbers is the idea of a world with an increased diversity of minds, waiting to help, teach and perhaps inspire us in new ways.

Musk claims OpenAI have ‘AGI’ and sues

At the same time we were sending out our newsletter last week, Elon Musk’s lawyers were sending a package of legal papers to the OpenAI offices in San Francisco. Sam Altman, the CEO of OpenAI is no stranger to corporate drama, having been ousted from his role for a weekend back in 2023 (an internal investigation is set to report back on that episode in the coming days). At the time he attributed some of the chaos to the level of tension being experience by his team as they neared their ultimate goal of creating ‘AGI’. The term AGI (artificial general intelligence) is increasingly used not to describe a distant abstract concept, but to define a threshold that some feel is perhaps only months away, and would signal computers systems as capable as humans across most day-to-day economically valuable tasks.

In Musk’s filings, and redacted emails counter-shared by OpenAI this week (rapidly unredacted by AI systems such as Claude 3) this debate about the implications of AGI were at the centre of the agreement Musk believes Altman and team committed to in 2015. The tension at that time was a fear that Google, having acquired London based DeepMind, and with their vast resources, would be unstoppable. Sam and Elon believed that a viable challenger with an ‘open’ not-for-profit model was needed to save humanity from Google domination. Fast forward to today, and OpenAI are one of the most secretive labs around, may be about to release a near-AGI level system, and are directly powering the profits of the world’s most valuable company Microsoft.

Takeaway: AGI is going to be hard for a court of law to define, and like quantum computing or cold fusion, may always feel near but just out of reach. Unlike quantum computing or cold fusion, ‘AGI-like’ technology is in our pockets and on our desks today, technology that can emulate humans in many different ways. Capitalism and consumerism will drive this race on, no matter which company leads. We should at least be individually scrutinising these firms, and collectively questioning how we balance the risks and rewards of techno-capitalism.

The end of an era of dominance

GPT-4 (the model in ChatGPT Plus, the paid service, and behind many advanced AI tools) has spent the last year or so as the undisputed (LLM) AI champion of the world. But in the space of a few weeks the ring has become much more crowded. ExoBrain have concluded that Claude 3 Opus is the new heavyweight, particularly in a qualitative sense due to its ‘meta-cognitive’ abilities. But there are challengers to OpenAI in every class. We’ve put together a comparison, based on industry data and our own assessments. Below we plot AIs against capability (scores on standard tests and input window size etc.) and cost.

To provide a sense of the range of costs, based on the number of words or images going in and out, the most expensive a Claude 3 Opus powered assistant, would set you back 60p to read 10 pages of A4 business content and generate about the same in analysis. Whereas, the French contender’s Mistral 7B, a fly-weight model in comparison, would cost less than a penny or indeed 180x less. Clearly such a model’s capabilities are also much more limited, it can do only basic reasoning on a max of 16 pages at once, but will do that 3x faster.

We’ve also pencilled in some of the new and upcoming models. Inflection’s Pi got a big upgrade on Thursday and is now GPT-4 class and 40% of the size (and likely) cost, and Google’s next Gemini models are nearing wider availability.

We would highlight 3 clusters of AIs:

  • State of the art, expensive, deep and untapped power: Claude 3 Opus and the yet to be released Google Gemini Ultra 1.5
  • Excellent general purpose all-rounders: GPT-4, Claude 3 Sonnet and Pi (although its conversational style is not suited to all)
  • Fast and cost-efficient automation workhorses: Gemini Pro, Chat GPT3.5 and Mixtral 8x7B
  • Click here for a hi-res version

JOEL

Takeaway: Why not take these models, in their respective AI assistant forms, for a spin? They represent billions of dollars of training and hosting compute, and yet in many cases are free to use as helpful chatbots…

  • Claude 3 Pro (Opus) (£18.00/month) The ultimate assistant as of March 2024
  • Claude 3 (Sonnet) (free) Fast text and vision assistance for occasional use
  • Google Gemini (Pro 1.0) (free) Includes a handy feature that lets you check outputs against Google search results
  • Google Gemini Advanced (Ultra 1.0) (2 months free, then £19.99/month) adds Google Docs, YouTube and other Google integrations
  • Mistral Le Chat (free, beta) access models of various sizes if you want a more European vibe
  • Inflection Pi (free) a super, sometimes perhaps overly friendly assistant
  • Poe (free and paid) try many models including Mixtral 8x7B running on Groq hardware for the fastest AI in the world
  • ChatGPT (free) The GPT3.5 based assistant most people have tried, but there are now more powerful alternatives!
  • ChatGPT Plus (£19.05/month) Adds GPT-4 and the richest set of extra features around, such as the ability analyse data files, and to create custom assistants (GPTs) with their own instructions and knowledge.
  • Microsoft Copilot is another assistant that uses GPT-4. The consumer (free) version is available in Bing and the (paid) Pro version can also access and work with Office files in Excel, PowerPoint and Word (£19.00/month).
  • For more technical people, a great place to explore new and open source models is the Together.ai playground
  • To use AI to search the web try Perplexity. The Pro version (paid, £15.70/month) can use either GPT-4 or Claude 3 Opus to power question and answer based search.

Hang-on why aren’t GPT-4.5 or GPT-5 on this chart?

Everyone is wondering what OpenAI’s counter-punch is going to be. OpenAI usually release on Thursday evening (UK time) and as we were pulling this weeks newsletter together, rumours were spreading that GPT-5 was about to be announced. It was a false alarm, but OpenAI are under pressure to react. The Musk law suit may be distracting them, and the effort and vast amount of compute required to delver a model materially more powerful than GPT-4 will be on a never-before seen scale. But there is likely a GPT-4.5, advanced new planning capability, or sub-agent system inbound, and maybe even GPT-5. It’s likely the capability axis on our chart is going to need extending in the coming weeks.

Final note, despite its significance, we haven’t included Meta’s Llama 2. It’s generally used as the basis of lots of customised models, and with Llama 3 fast approaching we feel its not as relevant as the ones selected for direct use in a bot or a system.

ExoBrain symbol

EXO

As we dive into another week of AI news, it’s clear that the technology continues to advance at a breakneck pace, permeating nearly every aspect of our lives and businesses. As the latest news stories reveal, China’s significant investments in AI startups, government support for the industry, and concerns over intellectual property theft underscore the country’s crucial role in shaping the global AI landscape.

AI business news

AI governance news

AI research news

AI hardware news

Week 29 news

Language models do the math, MA(AI)GA, and intelligence too cheap to meter?

Week 28 news

Bursting the bubble narrative, reimagining public sector productivity, and the age of reason

Week 27 news

A tale of two elections, agents untethered, and the art of conversation

Week 26 news

Claude 3.5 Sonnet hits the high notes, the rise of the AI engineer, and Figma’s new creative toolkit