2024 Week 50 news

December 12, 2024

Welcome to our weekly news post, a combination of thematic insights from the founders at ExoBrain, and a broader news roundup from our AI platform Exo…

Themes this week

JOEL

This week we look at:

Google’s Gemini 2.0, multimodal capabilities, and world models.
The release of Devin, an autonomous AI developer reshaping workflows for software engineering.
AI’s growing and sometimes controversial role in healthcare.

Gemini through the looking glass

Google’s Gemini 2.0 debuted this week, almost exactly one year on from the first Gemini release. The initial ‘2.0 Flash’ model, an efficient workhorse version, introduces true real-time video ingestion and native multimodal capabilities, and performance that punches well above its weight. This suggests a lot is to come from the more powerful siblings. The Gemini 2.0 comms barrage also included a raft of new products, projects and ideas, many of them around the notion of seamless switching between text, visuals and audio, and agents… for world understanding (Astra), coding (Jules), and browser tasks (Mariner). Not to be outdone, OpenAI simultaneously announced screen-sharing and live video features for ChatGPT’s Advanced Voice Mode, enhancing its ability to assist users in real-world activities.

At the core of Gemini 2.0 is its ability to understand and generate text, images, audio, and video interchangeably. Google believes ‘multimodality’ is one of the keys to creating AI systems that can work truly autonomously. The Multimodal Live API made available this week on the Google AI Studio allows developers to build systems that interpret visual data in real time, engage in dialogue, and execute tasks. A fluid voice chat that involves the AI being able to look at your screen or see what you are seeing and provide insights is impressive. OpenAI have likely been holding the roll-out of their live frame-by-frame video (demonstrated back in May) for just this moment. They duly released as the social media buzz around the Gemini 2.0 was developing, with people posting fun examples from completing maths homework to suggesting cocktails from observing a shelf of bottles. OpenAI’s audio and video experiences at this stage feel more polished, and whilst it’s taken many months, the rich interactions promised earlier in the year are now here.

But where Gemini 2.0 is different is the move from text fully interchangeable modalities, and what Demis Hassabis calls ‘world models’. Google is moving beyond internet training data, and the structured, pre-labelled visual data towards something more akin to human learning. Oriol Vinyals, co-tech lead of the Gemini project suggests that while current models excel at connecting concepts present with textual descriptions, they haven’t yet cracked developing true world understanding. Unlike a child who can watch objects fall and gradually build an intuitive understanding of gravity, many current AIs still rely heavily on human-provided text descriptions to make sense of visual information. But Google is determined to overcome that limitation and exploit the untapped potential of video understanding as a vast repository of knowledge about physics, causality, and natural laws – knowledge that exists independent of intrinsically limited human annotation. While models can tell you what’s happening in a video, they can’t yet extract fundamental principles from pure observation, at least for now. It’s the difference between describing a falling apple and deriving Newton’s laws of motion. Imagine a training regime that could extract fine grained insights from high-definition video, this is a world away from a brief textual explanation of a static image.

The introduction of OpenAI’s Sora may appear tangential to Gemini 2.0’s multimodal capabilities, but it actually sits at the heart of the same fundamental challenge – developing world models. While Sora’s immediate application is video generation, OpenAI is pursuing a similar goal to Google: teaching AI to understand physical laws and causality. What makes Sora particularly relevant is how it appears to have developed an intuitive grasp of physics, motion, and object persistence – not just through labelled data, but through learning to predict how scenes naturally unfold.

Takeaways: Google’s Gemini 2.0 release represents a huge bet on true multimodal AI rather than just connecting different modes through text, and OpenAI with Sora are making the same bet albeit via a slightly differing track. Both firms believe that to create robust, autonomous agents they must have a better understanding of the world. Humans can resolve issues with the tools and technology around us because we have spent years problem solving in the physical environment, and agents will need to do the same. LLMs can mimic human communication, new reasoning models are starting to solve complex scientific problems, will world models truly and independently understand the world? 2025 may be the year we find out.

Devin joins the team

Devin, the autonomous AI developer from Cognition Labs that we covered back in March, is now finally available, and this week engineering teams all around the world, including ExoBrain’s, have been on-boarding the shape of things to come. So far, we’ve been impressed; Devin is a capable and thorough team member, and most of all, is very willing to learn. With a clever memory feature, plus configurable playbooks, we can see Devin becoming ever more familiar with our projects, and able to handle increasingly complex tasks. We’ve been integrating Devin into our workflow via Cursor’s IDE extension, which keeps us updated on its activity, and Slack, where it can manage multiple conversations at once. Despite its strengths, Devin still benefits from super clear guidance and well-scoped tasks to ensure it stays on track. Its API opens up even more potential, such as programmatically launching instances to execute security patches or automate specific workflows.

With models improving and costs reducing, these autonomous colleagues can only improve; this is a glimpse into the future of many technical jobs. Devin is also part of a bigger picture. There is a growing ecosystem of AI tools shaping a new way to build software. With tools like Devin, Gemini Deep Research, Pinokio, o1 Pro Mode, Bolt, and Cursor working in tandem, it’s possible to employ an AI-enabled assembly line where ideas flow rapidly from strategy to deployment:

The newly released Gemini Deep Research (which uses a web browsing agent to source tens of documents and pages before writing a report in Google Docs), plus Perplexity’s AI search start the process with their ability to support market research and concept exploration.
OpenAI’s o1 Pro Mode can significantly extend these early ideas with strategic insights and detailed architectural planning and design.
Pinokio, meanwhile, is a useful piece of the puzzle, offering a quick way to find, install, and test the latest AI tools and open-source projects.
Bolt.new accelerates prototyping in the browser, leveraging Claude’s coding capabilities and web containers, ideas can be realised in a fraction of the typical time to create a testable prototype.
Cursor is emerging as the AI development powerhouse (with Windsurf hot on its heels). It facilitates accelerated coding with its integrated multi-model chat, slick in-line AI editing, and composer agents that can create and edit files autonomously. It also seamlessly pulls in external documentation, design documents, and coding standards to ensure the AI development is informed and aligned.
Vercel’s v0 is the AI-powered tool changing how developers create user interfaces. It offers a chat-based approach to generate production-ready UI components using modern web technologies like Next.js and React.

And finally, Devin can handle the heavy or time-consuming work once your product is up and running, testing new features, making repetitive changes, analysing bugs, and many of the hard yards needed to keep software up to date and robust over the longer term.

Takeaways: Devin and a range of other AI tools are reshaping how software is built. When combined thoughtfully, they can speed up development from initial research through to deployment and maintenance. While software engineering is an early testing ground, the lessons learned here will help guide AI adoption across many other technical fields.

JOOST

AI on the frontlines of healthcare

UnitedHealth, and the entire health insurance industry, has been under intense scrutiny following the tragic shooting of CEO Brian Thompson in New York. Police are investigating potential connections between the shooting and healthcare decisions, which has also brought attention to the company’s AI practices. For some years UnitedHealth has faced questions about its system, nH Predict, which is accused of systematically denying claims to elderly patients.

A Senate subcommittee report shows that denial rates for post-acute care increased from 10.9% in 2020 to 22.7% in 2022. A class action lawsuit filed in November 2023 claims the AI system has a 90% error rate in its decisions. The system evaluates patients based on diagnosis, age, and physical capabilities, but provides generic recommendations that often fail to consider individual circumstances.

This tragic episode highlights the need for more thorough design and implementation, transparency, accountability, and a human-centred approach in deploying AI for medical decision-making. Despite these controversies, 2024 has seen AI delivering significant contributions to the industry.

One notable example is AI’s role in advancing cancer diagnostics. The latest systems have shown exceptional accuracy in identifying tumours from imaging scans, often outperforming human radiologists in speed and precision. Recent studies have demonstrated AI tools detecting early-stage breast cancer with high accuracy, helping to ease the burden on overworked healthcare professionals.

In biotech and drug discovery, companies like DeepMind have used AI to predict new structures, speeding up the identification of therapeutic targets. This year’s breakthroughs include several AI-designed drugs entering clinical trials, with some showing promise for rare diseases previously considered untreatable.

Takeaways: The UnitedHealth case serves as a cautionary tale about the risks of deploying AI without sufficient oversight. As we look to 2025, several trends are set to define AI’s role in healthcare. Biotech startups are experiencing a surge in funding, with AI-acceleration at the forefront. We can expect further progress in precision medicine, where AI tailors treatments to an individual’s genetic profile, lifestyle, and environment. These advanced systems have the potential to rebuild patient trust, improve access to care, enhance treatment accuracy, and reduce costs across the board.

EXO

Weekly news roundup

This week saw major developments in AI infrastructure and enterprise adoption, with significant funding rounds, new hardware announcements, and growing focus on AI governance and safety measures across global markets.

AI business news

ChatGPT’s new projects feature can organize your AI clutter (Important productivity enhancement for power users of ChatGPT who need better conversation management.)
Klarna stopped all hiring a year ago to replace workers with AI, CEO says (Significant case study of AI’s impact on workforce transformation in fintech.)
Liquid AI raising $250 million to build AI inspired by worm brains (Novel biomimetic approach to AI development that could lead to more efficient systems.)
Google debuts NotebookLM for enterprises (New enterprise tool showing Google’s push into business AI applications.)
Harvard is releasing a massive free AI training dataset funded by OpenAI and Microsoft (Major contribution to open science and democratisation of AI development.)

AI governance news

BBC complains to Apple over misleading headline (Highlights growing concerns about AI-generated content accuracy.)
OpenAI blames its massive ChatGPT outage on a ‘new telemetry service’ (Reveals infrastructure challenges in scaling AI services.)
UK’s ambitions to police AI face Trump’s ‘starkly’ different approach (Important insight into diverging international AI regulation approaches.)
China sets up AI standards committee as global tech race intensifies (Shows China’s strategic moves in AI governance.)
Their job is to push computers toward AI doom (Reveals growing importance of AI safety testing roles.)

AI research news

AutoReason: automatic few-shot reasoning decomposition (Advances in making AI reasoning more efficient and transparent.)
Phi-4 technical report (Microsoft’s latest development in efficient language models.)
Unraveling the complexity of memory in RL agents: an approach for classification and evaluation (Important advancement in understanding AI memory systems.)
Maya: an instruction finetuned multilingual multimodal model (Progress in multilingual AI capabilities.)
InternLM-XComposer2.5-OmniLive: a comprehensive multimodal system for long-term streaming video and audio interactions (Breakthrough in handling long-form multimedia content.)

AI hardware news

Google’s new Trillium AI chip delivers 4x speed and powers Gemini 2.0 (Major advancement in AI processing capabilities.)
Cerebras launches CePO, enabling realtime reasoning capabilities for Llama AI models (Innovation in real-time AI processing.)
U.S. prepares new AI chip restrictions to close China’s backdoor access (Critical development in AI chip trade politics.)
AI-focused data center startup Crusoe raises $600M at $2.8B valuation (Shows growing investment in AI infrastructure.)
Apple is working on AI chip with Broadcom (Indicates Apple’s serious move into custom AI hardware.)