It was a super busy week in AI models; this is likely to be a long-running theme as big tech dollars continue to pour into foundational development.
As a reminder where the name includes “7b” or “32b” etc, this is the number of parameters or virtual ‘neurones’ in the model’s brain in billions. For reference GPT-4 is rumoured to be 16x111b which means its got a total of 1.77 trillion parameters. As we’ll find out this week many of the latest models achieve near it’s levels of capability with a lot less:
- Canadian outfit Coherer’s Command R+ 104b model jumped up the leader boards this week. Fast, capable and flexible, Cohere have a winner on their hands, especially for the enterprise market. (At least for this week.) It’s also possible to run this on a powerful Mac or PC with 64GB+ of RAM. Cohere have a brilliant AI toolkit you can try out here.
- Google’s Gemini Pro 1.5 with 1 million token input window became generally available in API form to start powering new products. It’s audio ingestion is particularly impressive, but the big 1 million inputs will not be cheap.
- OpenAI made sure to steal some thunder from Google by releasing their news just hours later. They finally shipped something of real note, although their release management was somewhat chaotic (there isn’t even a blog post to link to, no release notes, and nothing from Sam Altman on what is seemingly a key launch for the under-pressure lab). GPT-4 Turbo has been enhanced and the rumour mill theorised it has had the benefit of being fine-tuned on the output of an unreleased and more powerful sibling eg GPT-5? Initial evidence suggests it’s better at coding and maths. The new version is now available in ChatGPT.
- Almost simultaneously Mistral uploaded a giant new open open-weight model to the Internet, making a torrent available for anyone to grab. It’s a ‘Mixtral’ 8x22b, a super-efficient mixture of experts (an intelligent bundle of smaller brains that work together, 8x indicating the number of combined experts being 8) that will give many bigger models a run for their money but will be highly cost effective.
- JetMoE 8b is another open mixture of experts model that caught the headlines this week due to details revealed about its training budget. It reportedly burnt up a mere $100k, that’s significant because it’s a Llama 2 7b class model, which a year ago cost Meta multiple millions to train.
- Meanwhile we saw Alibaba release Qwen 1.5 32b, to fill the gap between small and larger open models. Google added further diversity to the open weight arena by releasing some Gemma variants, and embattled Stability AI launched a bigger LM2 language model. Finally Universal-1 is a new best in class speech-to-text model from Assembly AI.
- Next week won’t be any quieter with the much anticipated arrival of some small Meta Llama 3 models that will provide us with the first indications of the next wave of capability.
Takeaways: There are 2 notable signals in this barrage of release noise. The ability to run a high-end model on a laptop with Command-R+, and the JetMoE cost of just $100k for a powerful AI training run. All the way back in 2023 you needed tens of millions and a vast data centre to create any kind of serious AI. Just 1-year on some MacBook Pros and a departmental budget will be more than enough.
Check out our research section below for some of the ideas that will be powering the next wave of models, with infinite inputs, new modes of learning, and architectures.
