On Thursday OpenAI unveiled GPT-4o ‘Mini’, a scaled-down version of their most powerful model that’s 60% cheaper (around $0.24 per million blended tokens) than the old GPT-3.5. Once again, a lab is prioritising efficiency over raw intelligence and scale, to help maximise monetisation. It looks like the performance and cost will see it significantly undercut Google’s Gemini Flash and Anthropic’s Claude 3 Haiku, the previously leading low-cost options. Groq, the chip and inference provider also released new models including Llama-3-Groq-8B-Tool-Use, which will cost just $0.19 per million tokens, and can run at over 1,000 tokens per second. All in we’ve seen a staggering 100x reduction in cost for using AI models in only 2-years.
These developments highlight the ongoing race in the AI industry to balance performance and efficiency. As AI capabilities evolve, so does the computational power required to run these models, leading to significant environmental and economic concerns. The push for a more streamlined AI is driven by growing competition among providers and rising interest in smaller, specialised models that can perform specific tasks with high efficiency.
For ChatGPT this likely means the end of the road for GPT-3.5, the model that started it all when released back in 2022. For end-users, these advancements should translate into smarter and more responsive experiences across the board from chatbots to AI enabled apps. Longer term, how low can costs go? In a post on X promoting the launch, Sam Altman, CEO of OpenAI intimated that their goal was “intelligence too cheap to meter”?
Takeaways: As AI models become both more powerful and more efficient, businesses should constantly reassess their AI strategy. Consider the balance between specialised and general-purpose models in your stack and understand how new price points might unlock previously non-viable use-cases. You can test out GPT-4o Mini on ChatGPT Premium now.
