Week 20 news

Welcome to our weekly news post, a combination of thematic insights from the founders at ExoBrain, and a broader news roundup from our AI platform Exo…

Themes this week

JOEL

Themes this week:

  • We explore OpenAI’s Spring launch event and the brand new “GPT-4o” with its unnervingly friendly voice mode
  • We unpack the many Google I/O announcements, centred around their Gemini family
  • Whilst Google and OpenAI fight it out, we explore the battle for the future of the web

GPT-4 goes omni-modal

At the highly anticipated OpenAI Spring event on Monday, the company unveiled GPT-4o, a new ‘omni-modal’ model capable of fluidly handling images, video, text, and audio. The new model boasts near instant responsiveness, emotionally engaging interactions (think the movie ‘Her’), and new abilities in image generation. In perhaps the biggest news from the event, OpenAI announced plans to make GPT-4o accessible to its 100+ million free ChatGPT users. With a potential partnership with Apple for Siri further expanding its reach, the event showcased a GPT4 class models potential for true assistance via voice and following video and screen sharing. Here’s a rundown of the key announcements and their current status:

  • GPT-4o: Now available to most on the paid ChatGPT Premium. We’ve been using it extensively and its fast and very human-like in its general answers. Apparently, the free version rollout has started although so far only available to a minority of users.
  • GPT-4o image input: Now live in ChatGPT.
  • GPT-4o image generation: ChatGPT still uses DALL-E, so we await to try the new integrated modality.
  • GPT-4o voice input and output: ChatGPT is still using the old system, so no chance for most users to experience the instant response and “Her” style answers yet.
  • Improvement to data analysis: Now rolling out.
  • Custom GPTs: AIs created by users are still stuck on the old GPT-4.
  • New Mac and iOS apps: Not widely available yet (and no Windows or Android versions mentioned). The mobile demos all seemed very much geared towards closing the deal with Apple.

Full rollout expected over the next “few weeks”. The reaction has been mixed. Those who see LLMs hitting a wall in terms of progress suggested the release proves this thesis, those with other opinions suggested that its capabilities and likely more efficient size prove there is much progress to come. GPT-4o is officially now top on the LMSYS Arena Leaderboard. Some have found it better in writing tasks, but less capable than the older GPT-4 in coding and more complex multi-step reasoning. It is an entirely new model compared to GPT-4, trained primarily to be cheaper to run and multi-modal. Many believe GPT-5 will build substantially on this new combined video, image, audio, and text approach.

Days after the launch event OpenAI were back in the headlines with the resignations of Ilya Sutskever, OpenAI co-founder and Chief Scientist (who has been keeping a very low profile since Sam Altman’s temporary ousting in November 2023) and Jan Leike, co-head of OpenAI’s AI safety team. These resignations follow several other safety-related departures, including Daniel Kokotajlo, suggesting a growing concern among AI safety researchers about OpenAI’s priorities. It’s likely the compute budget at OpenAI is being directed to new products. Sutskever (who’s focus since being instrumental in getting GPT models to work so effectively, has been on managing their growing capability as they scale) seems to be looking for other avenues to pursue this ‘alignment’ research. OpenAI has recently hired Shivakumar Venkataraman, who previously led the Google search ads business, so it seems, given the product nature of 4o, the OpenAI culture is in flux.

Takeaways: No new enterprise features were announced (and no sign of Microsoft), and nothing on autonomous agent-based capabilities. Look out for separate events in the coming weeks.  The release of GPT-4o marks a significant milestone in the evolution of AI and its integration into everyday life through voice and video; the traditional digital assistant is about to be replaced. The products OpenAI (and Google) embed into our lives will have profound consequences.

The Gemini era

On Tuesday at Google’s annual developer conference, I/O, the company announced the “era of Gemini” and numerous new products and AI integrations across their platforms. Google also showcased the evolution of search, with AI summaries, and a move from retrieval to action, with Gemini orchestrating complex multi-step tasks from the search box. Despite the impressive demos, Google appeared to be playing catch-up to its competitors, with many features scheduled for release later in the year or even in 2025. Here’s a rundown of the key announcements and their current status:

  • AI summarised search results (SGE): Rolling out in the US next week, and the rest of the world soon.
  • Action based search agents: Multi-step actions from the search box. No timeline on release.
  • Gemini 1.5 Pro: 2 million tokens expanded input capacity. Only available to select partners.
  • Gemini Flash: Faster, cheaper 1 million token model. Available now in preview.
  • Gemma updates: Larger 27 billion parameter open-weight model a few weeks out (which should compete with Llama 3 70 billion).
  • Gemini for Workspace (Docs, Sheets and Slides etc): Sidebar with generative features. Available later in the year.
  • AI Teammate in Gemini for Workspace: Co-worker agent with their own user identity. Not available until 2025.
  • Veo 1080p video generation, impressive but not state of the art. Only available to a very select few right-now. No timeline on release.
  • Imagen 3 text-to-image: Again, impressive but not state of the art and only available via a waitlist.
  • MusicFX audio tools: Slick celebrity videos, but only available to a very select few right-now. No timeline on release.
  • SynthID: Watermarking technology, will be complementary to C2PA. Can be used now in Imagen 2+.
  • Project Astra: Live world interaction ‘universal agent’ tech, and glasses with augmented reality: Very early demo. No timeline on release.
  • Scam detection during calls: Gemini Nano-powered feature to alert users of potential scams in real-time. No timeline on release.
  • Ask Photos: Natural language search for Google Photos using Gemini AI. Rolling out later this summer.
  • Gemini in Gmail: AI-assisted email searching, summarizing, drafting, and complex task handling. No timeline on release.
  • New Gemini on Android: AI-powered assistant with deep integration into Android and Google apps. No timeline on release.

Takeaways: The many delays aside, the intent was clear, Google are now fully focused on AI and regaining domination in this space. Their new orientation around the Gemini family and huge research investment with the likes of AlphaFold, will eventually pay dividends. This has been contrasted with Microsoft, who now seem to be hedging their bets with multiple in-house model developments alongside their partnership with OpenAI. Google remain wedded to the search paradigm, and their stock has taken some hits, but if they can pull off the delivery of complex planning features through the simplest of interfaces, they could be on to a winner. As per the emails released as a result of the Musk court case against OpenAI, it is ironic that OpenAI was clearly setup to offset Google’s then dominance of AI (and Microsoft themselves invested because they realised, they were way behind Google). Now the competition is instilling Google with newfound resolve.

The battle for the soul of the digital age

The days of the early web were a time of incredible creativity and possibility. The web of the 90s and 00s was a new frontier where anyone could create a website or blog, share their ideas, their identity and interests. There was a sense of a democratising and transformative potential of a new medium as this intriguing David Bowie interview highlights. Early cyberspace felt weird, experimental, open, and seemed destined to reshape society in positive ways.

But in recent years the talk of “online harms”, “dark forests, and “walled gardens”, and crypto-web3 ownership, portrays a contrasting vision, where the early web’s openness has given way to a more enclosed, controlled, and fragmented online landscape. Much of the internet has become dominated by a handful of giant tech platforms – the “walled gardens” – which seek to keep users within their own proprietary ecosystems, feeding them content, and services personalised by algorithms and now AI, that many feel cause more harm than good.

Outside these gardens lie the “dark forests”, the parts of the web that are harder to find, less engaged with, and more prone to misinformation, conspiracy theories, extremism, and crime. The internet has lost its early innocence and has instead become a place where power is concentrated, surveillance is pervasive, and genuinely positive communities are hard to find. At the same time, the dark forest metaphor also hints at the persistence of spaces outside the mainstream, where subcultures and alternative communities do still thrive. The places beyond the walls may be a refuge for those seeking to escape the conformity and of the “walled gardens,” even as they present their own risks.

The inexorable rise of giant social platforms and AI have set the stage for perhaps the final battle over the future of the web. On one side are those who believe that AI-powered tools, for all their potential risks, can still be harnessed to make the web more useful, efficient, and accessible as we saw at Google I/O this week. Google argue that AI can help users navigate the vast quantities of online information, surface high-quality content, and even spur new forms of creativity, expression, and agency.

On the other side are those who fear that AI poses an existential threat, and Google is presiding over a managed decline of the web. They worry that AI-generated content will drown out human voices, that AI-powered platforms will further concentrate power, and that the widespread use of AI will undermine the advertising model that has long sustained the web. Some analysts predict that new AI search could contribute to a significant decline in website traffic.

As AI systems become more advanced, autonomous and good at their job, they may also begin to erode the very foundations upon which they were built – the human-created content and interactions that serve as their training data. AI is displacing the vibrant, diverse, and unpredictable web that gave rise to it. Will these machines born of the forest, finally extinguish it’s dying light? Or will they create a new web of their own? WebSim takes AI generation and makes it an artform; an infinite number of websites and apps are being generated as we speak (by Claude 3), try it out; type in any imagined URL, and enter a strange new web.

A decaying web could also have profound implications for businesses. In this environment, they may find it increasingly difficult to reach and engage with customers organically, as their content gets lost in a sea of AI-generated noise. They may become more dependent on paid advertising and promotion through dominant platforms, which could drive up costs and reduce margins. Businesses that rely heavily on web traffic for their revenue, such as online publishers and e-commerce sites, could see their income streams dry up as users spend more time within walled gardens and less time exploring the open web. There is even talk today of optimising websites not for their human visitors but for autonomous agents. At the same time, the proliferation of AI-generated content and deepfakes could erode trust, making it harder for businesses to establish credibility and build relationships with customers.

Takeaways: The battle for the web’s future is a battle for the soul of the digital age. Will the web remain an open platform for collective intelligence and self-expression, will it be ‘rewilded‘, or will it become an ever more homogenised medium, shaped by a few AI-powered gatekeepers, navigated only by extractive data gathering agents feeding information back to their users’ bubbles?

Those who depend on web search traffic for their business or their livelihoods need to pay close attention and start preparing a plan B. How can we respond? Firstly, we should continue to create human content. At ExoBrain we spend a bit of time each week trying to write useful thoughts on the news, assisted by AI in the research and review, but not simply generated by our agents. We should all also embrace interoperability by using technologies that reduce reliance on centralised power, allowing for greater user control and data portability, such as the open-weight AI models we often highlight. We should lobby governments to take public digital infrastructure and web regulation more seriously. And we should foster digital literacy in our society and particularly the younger generations, and help them make informed choices about using AI and the big algorithmic systems.

ExoBrain symbol

EXO

This week’s news highlights the growing demand for AI expertise, the potential economic benefits of AI adoption, and the ongoing debates around AI governance and responsible development. It also showcases the latest advancements in AI research and hardware, signalling the rapid progress in this field.

AI business news

AI governance news

AI research news

AI hardware news

Week 29 news

Language models do the math, MA(AI)GA, and intelligence too cheap to meter?

Week 28 news

Bursting the bubble narrative, reimagining public sector productivity, and the age of reason

Week 27 news

A tale of two elections, agents untethered, and the art of conversation

Week 26 news

Claude 3.5 Sonnet hits the high notes, the rise of the AI engineer, and Figma’s new creative toolkit