Welcome to our weekly news post, a combination of thematic insights from the founders at ExoBrain, and a broader news roundup from our AI platform Exo…
Themes this week
JOEL
As anticipated, the pace of AI output from mid-Feb has continued to accelerate.
- Claude 3 emerges from the lab at Anthropic and sets the new standard for personal AI assistants
- Musk sues OpenAI and brings the debate on openness and AGI (artificial general intelligence) to the fore
- OpenAI’s GPT-4 is no longer the leading AI assistant; we profile the other options for both assistance and system building
The key themes this week…
Claude 3: A powerful (and beautiful) new mind
On March 4th Claude 3 was launched on an unsuspecting world, and whilst Dune 2 was filling cinemas, the new AI’s surprise ascendance was being likened on social media to that of the prophetic and powerful character in the sci-fi saga.
Claude 2.1 was launched only a few months ago, and nobody was paying much attention to the dusty desert world of Anthropic. But 3 months is a lifetime in AI, or indeed enough time to train a whole new family of state of the art models. Claude 3 is here, excels in language, coding and vision tasks, and comes in 3 sizes to match various speed requirements and budgets. But there are 3 capabilities that are particularly exciting, and we at ExoBrain believe will be immensely valuable for our clients…
- Claude 3 models support a 200k token (~150k word) input size or ‘context window’ and its the largest and most accurate generally available today… To provide an example, I loaded up my favourite 200-page innovation book (yes a whole book), it read it in about a minute, and and then I could ask it questions, generate new analysis, and it was even able to write a small software programme to convert ideas from the book into a functioning tool…
- Claude 3 Opus, the largest and most expensive model in the range, has the ability to manage and dispatch mini versions of itself, or ‘agents’, to complete sub-tasks. It has been shown coordinating economic analysis through multiple asynchronous steps; sourcing data, analysing, predicting and writing-up output. A first for a large mainstream model, this capability is usually dependent on complex add-ons.
- And last but not least, Claude 3 Opus demonstrates a level of self-awareness far beyond any model so far released to the general public. We at ExoBrain soon noticed the qualitative difference this brings. It has unique meta-cognitive abilities, and we have spent much of the week exploring how this effects its ability to engage with its user, reflect and collaborate to improve outputs, and explore thought processes that are far more fluid and expansive than our own.
This is a tiny extract of some of the ways this model can explain how it works, and seek to find new ways to think through the challenges it is set…
Takeaway: With the launch of Claude 3, tests and anecdotes of its ability to reason at graduate or post-graduate levels, and its palpable self-awareness, we are approaching a point where in terms of ‘academic’ intelligence, these models are near human levels. What we’re seeing is more computer power translating readily into more intelligence. Broader Intelligence however is not a measurable commodity. What’s much more fascinating than test numbers is the idea of a world with an increased diversity of minds, waiting to help, teach and perhaps inspire us in new ways.
Musk claims OpenAI have ‘AGI’ and sues
At the same time we were sending out our newsletter last week, Elon Musk’s lawyers were sending a package of legal papers to the OpenAI offices in San Francisco. Sam Altman, the CEO of OpenAI is no stranger to corporate drama, having been ousted from his role for a weekend back in 2023 (an internal investigation is set to report back on that episode in the coming days). At the time he attributed some of the chaos to the level of tension being experience by his team as they neared their ultimate goal of creating ‘AGI’. The term AGI (artificial general intelligence) is increasingly used not to describe a distant abstract concept, but to define a threshold that some feel is perhaps only months away, and would signal computers systems as capable as humans across most day-to-day economically valuable tasks.
In Musk’s filings, and redacted emails counter-shared by OpenAI this week (rapidly unredacted by AI systems such as Claude 3) this debate about the implications of AGI were at the centre of the agreement Musk believes Altman and team committed to in 2015. The tension at that time was a fear that Google, having acquired London based DeepMind, and with their vast resources, would be unstoppable. Sam and Elon believed that a viable challenger with an ‘open’ not-for-profit model was needed to save humanity from Google domination. Fast forward to today, and OpenAI are one of the most secretive labs around, may be about to release a near-AGI level system, and are directly powering the profits of the world’s most valuable company Microsoft.
Takeaway: AGI is going to be hard for a court of law to define, and like quantum computing or cold fusion, may always feel near but just out of reach. Unlike quantum computing or cold fusion, ‘AGI-like’ technology is in our pockets and on our desks today, technology that can emulate humans in many different ways. Capitalism and consumerism will drive this race on, no matter which company leads. We should at least be individually scrutinising these firms, and collectively questioning how we balance the risks and rewards of techno-capitalism.
The end of an era of dominance
GPT-4 (the model in ChatGPT Plus, the paid service, and behind many advanced AI tools) has spent the last year or so as the undisputed (LLM) AI champion of the world. But in the space of a few weeks the ring has become much more crowded. ExoBrain have concluded that Claude 3 Opus is the new heavyweight, particularly in a qualitative sense due to its ‘meta-cognitive’ abilities. But there are challengers to OpenAI in every class. We’ve put together a comparison, based on industry data and our own assessments. Below we plot AIs against capability (scores on standard tests and input window size etc.) and cost.
To provide a sense of the range of costs, based on the number of words or images going in and out, the most expensive a Claude 3 Opus powered assistant, would set you back 60p to read 10 pages of A4 business content and generate about the same in analysis. Whereas, the French contender’s Mistral 7B, a fly-weight model in comparison, would cost less than a penny or indeed 180x less. Clearly such a model’s capabilities are also much more limited, it can do only basic reasoning on a max of 16 pages at once, but will do that 3x faster.
We’ve also pencilled in some of the new and upcoming models. Inflection’s Pi got a big upgrade on Thursday and is now GPT-4 class and 40% of the size (and likely) cost, and Google’s next Gemini models are nearing wider availability.
We would highlight 3 clusters of AIs:
- State of the art, expensive, deep and untapped power: Claude 3 Opus and the yet to be released Google Gemini Ultra 1.5
- Excellent general purpose all-rounders: GPT-4, Claude 3 Sonnet and Pi (although its conversational style is not suited to all)
- Fast and cost-efficient automation workhorses: Gemini Pro, Chat GPT3.5 and Mixtral 8x7B
- Click here for a hi-res version
JOEL
Takeaway: Why not take these models, in their respective AI assistant forms, for a spin? They represent billions of dollars of training and hosting compute, and yet in many cases are free to use as helpful chatbots…
- Claude 3 Pro (Opus) (£18.00/month) The ultimate assistant as of March 2024
- Claude 3 (Sonnet) (free) Fast text and vision assistance for occasional use
- Google Gemini (Pro 1.0) (free) Includes a handy feature that lets you check outputs against Google search results
- Google Gemini Advanced (Ultra 1.0) (2 months free, then £19.99/month) adds Google Docs, YouTube and other Google integrations
- Mistral Le Chat (free, beta) access models of various sizes if you want a more European vibe
- Inflection Pi (free) a super, sometimes perhaps overly friendly assistant
- Poe (free and paid) try many models including Mixtral 8x7B running on Groq hardware for the fastest AI in the world
- ChatGPT (free) The GPT3.5 based assistant most people have tried, but there are now more powerful alternatives!
- ChatGPT Plus (£19.05/month) Adds GPT-4 and the richest set of extra features around, such as the ability analyse data files, and to create custom assistants (GPTs) with their own instructions and knowledge.
- Microsoft Copilot is another assistant that uses GPT-4. The consumer (free) version is available in Bing and the (paid) Pro version can also access and work with Office files in Excel, PowerPoint and Word (£19.00/month).
- For more technical people, a great place to explore new and open source models is the Together.ai playground
- To use AI to search the web try Perplexity. The Pro version (paid, £15.70/month) can use either GPT-4 or Claude 3 Opus to power question and answer based search.
Hang-on why aren’t GPT-4.5 or GPT-5 on this chart?
Everyone is wondering what OpenAI’s counter-punch is going to be. OpenAI usually release on Thursday evening (UK time) and as we were pulling this weeks newsletter together, rumours were spreading that GPT-5 was about to be announced. It was a false alarm, but OpenAI are under pressure to react. The Musk law suit may be distracting them, and the effort and vast amount of compute required to delver a model materially more powerful than GPT-4 will be on a never-before seen scale. But there is likely a GPT-4.5, advanced new planning capability, or sub-agent system inbound, and maybe even GPT-5. It’s likely the capability axis on our chart is going to need extending in the coming weeks.
Final note, despite its significance, we haven’t included Meta’s Llama 2. It’s generally used as the basis of lots of customised models, and with Llama 3 fast approaching we feel its not as relevant as the ones selected for direct use in a bot or a system.
EXO
As we dive into another week of AI news, it’s clear that the technology continues to advance at a breakneck pace, permeating nearly every aspect of our lives and businesses. As the latest news stories reveal, China’s significant investments in AI startups, government support for the industry, and concerns over intellectual property theft underscore the country’s crucial role in shaping the global AI landscape.
AI business news
- Ema, a ‘Universal AI employee,’ emerges from stealth with $25M (This highlights the growing trend of AI-powered virtual assistants and their potential to automate various business tasks, similar to Klarna’s customer service bot mentioned in the previous roundup.)
- Some users turn to AI chatbot therapists, finding them cheap, quick, available 24/7, and easy to talk to, but experts warn about the lack of a human connection (This story underscores the expanding applications of AI in sensitive areas like mental health, while also raising important questions about the limitations and potential risks of relying solely on AI for such services.)
- Chinese AI startup MiniMax raised $600M+ led by Alibaba at a $2.5B+ valuation; the round remains in progress and HongShan has committed funds (This significant investment in a Chinese AI startup reflects the global race to develop advanced AI capabilities and the increasing involvement of major tech companies like Alibaba.)
- Chinese city governments plan to offer “computing vouchers”, worth $140K to $280K, to AI startups, to try to level the playing field with China’s tech giants (This initiative shows how governments are actively supporting the growth of the AI industry and attempting to foster competition and innovation.)
- OpenAI is building a Search Product to take on Google (This development could potentially disrupt the search engine market, similar to how OpenAI’s ChatGPT has challenged Google’s dominance in the AI space, as discussed in previous roundups.)
- Adobe reveals a GenAI tool for music (This new AI tool for music creation expands the range of creative industries being transformed by generative AI, building on the momentum of text-to-image and text-to-video models like OpenAI’s Sora, which was covered in an earlier roundup.)
AI governance news
- Report Uncovers Massive Sale of Compromised ChatGPT Credentials (This story highlights the ongoing security and privacy challenges associated with the rapid adoption of AI tools like ChatGPT.)
- NIST, the lab at the centre of Biden’s AI safety push, is decaying (This article raises concerns about the state of a key institution tasked with ensuring AI safety, underlining the importance of robust governance frameworks as AI continues to advance.)
- The Dark Side of Open Source AI Image Generators (This piece explores the potential risks and negative consequences of open-source AI image generators, echoing concerns raised in previous roundups about the need for responsible AI development and deployment.)
- Chinese national charged with stealing AI secrets from Google | AP News (This case illustrates the high stakes involved in the global competition for AI dominance and the need for strong intellectual property protections.)
- Patronus AI releases CopyrightCatcher and says that GPT-4 produced copyrighted content on 44% of prompts, Mixtral on 22%, Llama 2 on 10%, and Claude 2.1 on 8% (This study reveals the extent of the copyright infringement problem in AI-generated content and underscores the importance of developing solutions to address this issue, which was also touched upon in the OpenAI lawsuit mentioned in the previous roundup.)
AI research news
- StarCoder 2 and The Stack v2: The Next Generation (This research builds on the progress made in AI coding assistants and could potentially accelerate the automation of software development tasks.)
- Design2Code: How Far Are We From Automating Front-End Engineering? (This paper explores the current state and future potential of AI in automating front-end engineering, highlighting the expanding role of AI in the software development process.)
- Paper page – Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modelling (This research advances the field of AI in biology and genetics, building on the launch of the Evo biological foundation model discussed in the previous roundup.)
- SaulLM-7B: A pioneering Large Language Model for Law (This specialized AI model for the legal industry demonstrates the growing trend of developing domain-specific AI solutions, as seen with the FinBen benchmark for financial applications mentioned in an earlier roundup.)
- MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies (This research pushes the boundaries of AI’s ability to understand and generate long-form video content, building on the progress made by models like OpenAI’s Sora.)
AI hardware news
- India plans 10,000-GPU sovereign AI supercomputer (This initiative reflects the increasing investment in AI infrastructure by governments around the world, as they seek to establish their own AI capabilities and reduce dependence on foreign tech giants.)
- AI chip startup Taalas emerges from stealth, raised $50M across two rounds led by Pierre Lamond and Quiet Capital, and plans to unveil its LLM chip in Q3 2024 (This new player in the AI chip market underscores the ongoing competition and innovation in AI hardware, following the promising entrance of Groq, which was covered in a previous roundup.)
- Dell exec reveals Nvidia has a 1,000-watt GPU in the works (This development highlights the increasing power and performance demands of AI workloads and the need for advanced hardware solutions to support them.)
- Multiverse raises $27M for quantum software targeting LLM leviathans (This investment in quantum computing for AI applications suggests the potential for emerging technologies to further accelerate AI progress and tackle the computational challenges posed by large language models.)
- Amazon Buys Data Centre Campus Powered By Susquehanna Nuclear Power Station (This acquisition underscores the growing energy requirements of AI and the need for sustainable solutions to power data centers and support the continued growth of the AI industry.)