Week 30 news

Welcome to our weekly news post, a combination of thematic insights from the founders at ExoBrain, and a broader news roundup from our AI platform Exo…

Themes this week

JOEL

This week we look at:

  • Meta’s giant new Llama 3.1 405B open-weight model and its potential to reshape the landscape.
  • Our co-authored white paper on the fund management industry’s AI-driven future.
  • Google’s Olympic-level maths breakthrough and the practical roles AI will play at Paris 2024.

Llamas now graze on the open frontier

After much anticipation this week Meta unveiled the new Llama 3.1 ‘open-weight’ (freely downloadable) multilingual language model ‘herd’; this includes a distilled small 8B (8 billion parameter) model, a medium 70B variant, plus a new 405B monster.

Based on Meta’s benchmarks the 405B looks to be matching the most powerful ‘closed’ competitors such as GPT-4o and Claude 3.5 and leading the way in certain areas such as maths and scientific use-cases. This release really does mark a pivotal moment in AI development, as  a model freely available for download now matches the capabilities of any cutting-edge, proprietary cloud hosted system. The implications are far-reaching, extending across business and the also the use by friendly and adversarial states, and as CEO Mark Zuckerberg puts it, this large model will provide a new option for training or distilling down into smaller specialised solutions.

405B is around 75% smaller (by parameter count) compared to the rumoured 1.8 trillion GPT-4 but demonstrate superior performance; this continues the trend of reductions in model size while improving capabilities, showcasing the advancements in training data cleansing techniques we’ve seen in the last year. Whilst the compute used to train likely exceeds any other model to date, the size reduction means reduced ongoing usage cost; ~$3 per million tokens, versus $6 and $7.5 for Anthropic and OpenAI’s best models respectively.

Downloading Llama 3.1 405B is one thing, but running this behemoth is another matter. The model tips the scales at around 810GB in its raw form, the rule of thumb for memory needs are 1.2x the size on disk. As Artificial Analysis pointed out, this means to run you’d need a system with more than the 640GB of high-bandwidth memory that comes with an industry standard 8x H100 GPU rig. So that means two Nvidia machines connected with high bandwidth interconnects, to get the 405B up and running… likely to set you back near £1 million (or $40k/month in the cloud)!

Enter quantization, the less talked about number that is increasingly important in AI deployment. Think of it as the MP3 bitrate of the AI world – a clever compression technique that allows us to squeeze more model into less memory. Just as MP3s made it possible to store thousands of songs on early music players, quantization enables us to run large AI models on more modest hardware. By reducing the ‘precision’ of the model’s parameters (it’s massive matrix of numbers) from 16-bit floating-point to 8-bit or even lower, it’s possible to slash memory requirements. Compress 405B to 8-bit and it will run happily on a typical 8 GPUs server. Thomas Scialom, the training lead on Llama 3.1 indicated on the Latent Space podcast this week that it’s the expectation the model will be run in it’s 8-bit form, and there is a real potential for this kind of compression to make AI more powerful at any given compute budget, including during training.

Quantization comes in several flavours, each offering a different balance of compression and accuracy, and both during and after training. The most straightforward method simply uses fewer bits to represent each number in the model, much like reducing the colour depth of an image. More sophisticated techniques include grouping similar values together or using variable levels of precision for different parts of the model. The quantization trade-offs are complicated. For many natural language processing tasks, the impact on performance is negligible. In these cases, a big model with high compression can run on much less memory and still beat a small model with no compression. However, for specialised applications like multilingual processing, coding, or complex mathematical reasoning the reduced precision can lead to very noticeable degradation in output quality. It’s a balancing act between model size, computational efficiency, and task-specific performance. 8-bit quantization in most cases sees very little reduction in performance, especially with larger models, which are more resilient to compression. 6-bit quantization is a good point of balance, but 4-bit and 2-bit generally see major negative impacts including in some cases, also slower throughput. Research we highlighted a few months ago proposes so-called “1-bit” quantization and Meta’s Scialom believes there is merit in the idea.

The availability of state-of-the-art open-weight models democratises access to cutting-edge AI capabilities to a degree. However, 400+B parameter models are not going to have a direct impact on the burgeoning world of running ‘small language models’ on your laptop just yet. Looking ahead, this is not going to be like the iPod era where the capacity of MP3 players grew fast while the costs dropped. The expensive high bandwidth memory used by these servers has reduced in cost recently but is expected to rise in the coming year (3-8% QoQ) given demand, high production complexity, and the introduction of the more expensive GDDR7. As memory capacity struggles to keep pace with model size, innovations in quantization and other compression techniques will become increasingly high-profile. Much effort will continue to shift to both getting more from fewer bits, and fundamentally new model architectures that can use memory more efficiently. The path to artificial general intelligence (AGI) is about both growing and compressing our newfound artificial brains.

Takeaways: Llama 3.1 405B represents a significant milestone in open-weight AI, but its sheer size limits its directly disruptive potential. Nevertheless, it levels the playing field and provides a powerful new tool to create new smaller models. Meta’s openness is also impressive, their technical report contains a huge amount of detail. With many service providers offering access already such a Groq, Together AI (try on their playground here), Amazon, and Google, it puts pressure on the likes of Anthropic and OpenAI to reduce their prices and offer more flexibility. Meta’s strategy is twofold: squeezing competitors, while warming up their 600,000+ GPUs to leverage their vast social training data (outside the EU at least). But while this release is a significant stepping stone towards more powerful AI, it also highlights some current barriers. Despite the ingenuity of model quantization, we’re still facing a memory cost wall that restricts the widespread deployment of large models. The race to overcome these barriers will accelerate, whilst Meta start training Llama 4. Next move OpenAI?

It’s do or die for asset management

As AI rapidly advances, its impact varies across industries. Despite ongoing technology adoption, the asset (fund) management sector is encountering a revenue and productivity plateau.

We’ve co-authored a white paper with our partners at fVenn and it examines how AI can revitalise asset management, exploring the industry’s unique position to harness the technology’s potential. We present a comprehensive transformation roadmap and argue that swift action is essential for firms to maintain their competitive edge in an evolving financial landscape.

Download the white paper here and read the companion article on CityWire.

JOOST

AI goes for gold (and gets silver)

In last week’s newsletter, we explored the rapid advancements in AI’s mathematical capabilities. This week, that progress has been dramatically underscored as a Google DeepMind system achieved silver at the International Mathematical Olympiad (IMO). By solving four out of six problems, including the competition’s hardest question, the system demonstrated its ability to tackle challenges at a level comparable to the world’s brightest mathematicians. This achievement is particularly significant in terms of the mix of techniques used, combining LLMs, reinforcement learning, and formal mathematical proofs in a novel way. This synergy of different approaches demonstrates how complex problems can be tackled by leveraging inventive architectures. Its also interesting to see Gemini playing a key role in the solution, translating natural language problem statements into formal mathematical statements to help other AI components train themselves. More details will no doubt emerge, but this looks like another notable breakthrough by the DeepMind team.

Meanwhile AI’s influence isn’t confined to the realm of the abstract. As the sporting Olympiad gets underway in Paris today, AI is set to transform how we experience, participate in, and safeguard major sporting events. The International Olympic Committee (IOC) has identified over 180 potential use cases for AI in these games. Google is partnering with NBC and Team USA and will leverage Gemini to enhance viewer experience. Commentators will use AI-powered tools to provide quick insights into sports, athletes, and rules, making the games more accessible and engaging for audiences worldwide. NBC’s “chief superfan commentator,” Leslie Jones, will use Gemini to explore and explain new sports in real-time, adding a fresh dimension to Olympic coverage.

AI’s role in enhancing viewer experiences and providing personalised content could revolutionise how media companies engage with their audiences. NBCUniversal’s “Your Daily Olympic Recap,” which offers AI-generated summaries of favourite events narrated by an AI version of legendary sports presenter Al Michaels, points to a future where content is tailored to individual preferences at scale. AI is also pivotal in bolstering security around Paris. Systems have been deployed to monitor crowds and detect suspicious activities. This sophisticated surveillance system is designed to pre-emptively identify potential threats, providing an additional layer of security beyond traditional methods. Sports analysis is also in scope for enhancement: technologies like Intel’s 3D athlete tracking (3DAT) analyses biomechanics, potentially levelling the playing field by helping coaches identify talent and optimise training regimens.

Takeaways: AI’s Olympic prowess is diverse. Progress in maths competitions presages more powerful reasoning and scientific uses. Use in sporting events shows versatility but critics will rightly worry about privacy infringement and the normalisation of surveillance. This tension between security and civil liberties exemplifies the broader challenges of AI adoption. The Paris Olympics will no doubt keep us on the edge of our seats for the next two weeks, but it will also serve as a significant test case for the integration of AI across various aspects of a large global event.

ExoBrain symbol

EXO

Weekly news roundup

This week’s news highlights the ongoing competition in AI model development, concerns over AI’s impact on various industries, continued investment in AI startups, and the race for AI hardware supremacy.

AI business news

AI governance news

AI research news

AI hardware news

Week 42 news

Image generation group test, feral meme-generators from the future, and AI goes nuclear

Week 40 news

OpenAI accelerates, auto-podcasting emerges from uncanny valley, and why not to take every economist’s view on AI at face value

Week 39 news

AlphaChip plays the optimisation game, money talks as content walks, and infrastructure goes hyperscale