Week 18 news

Welcome to our weekly news post, a combination of thematic insights from the founders at ExoBrain, and a broader news roundup from our AI platform Exo…

Themes this week

JOEL

Themes this week:

  • We follow the continuing Sam Altman and OpenAI circus, and investigate the vanishing act of a mysteriously powerful chatbot.
  • We look at California’s attempt to curb the AI labs, and the current state of AI regulation as the second global safety summit nears.
  • We explore the rise of the CAIO (Chief AI Officer) and the debate around the value of this new role.

Sam Altman promotes the next generation of AI

AI circus ringmaster and OpenAI CEO Sam Altman, has been making some bold claims about the future of AI, hinting at the company’s big visions for autonomous agents capable of tackling complex tasks and potentially serving as a stepping stone to the artificial general intelligence (AGI).

Speaking on Wednesday, he talked of a “super-competent colleague that knows absolutely everything about my whole life, every email, every conversation I’ve ever had, but doesn’t feel like an extension.” He also doubled down on the narrative that what comes next will be a big step, calling today’s ChatGPT “mildly embarrassing at best” and GPT-4 the “dumbest model” we’ll ever have to use. He went on to say that with some scientific certainty “GPT-5 is going to be a lot smarter than GPT-4” and GPT-6 would also see a similar jump.

If that was not enough expectation building, hours earlier a mysterious new chatbot had emerged briefly on a testing site that sent the AI community into overdrive. The site in question LMSYS.org, provides a blind testing interface for user to rate bots and publishes a widely used leaderboard. On Sunday night the first reports surfaced, with people seeing a model dubbed “gpt2-chatbot” appearing in tests that had unprecedented capabilities. Soon everyone was heading to the site to try it out. “The whole situation is so infuriatingly representative of LLM research,AI researcher Simon Willison told Ars Technica. “A completely unannounced, opaque release and now the entire Internet is running non-scientific ‘vibe checks’ in parallel.”

Our own vibe checks suggested a GPT-4 level model for logic tasks but an impressive ability to self-reflect and generate very detailed plans, more so than anything we’ve seen before. We also run a kind of cognitive process scan on AI models we use. On one of our probes gpt2-chatbot responded: “In reflecting on the methods used and possible biases, I engage in a simulated form of metacognition, analyzing my own ‘thought’ processes and decision-making strategies as if stepping back and reviewing a human’s cognitive processes” indicating some unusual internal loops that could explain its capabilities.

As theories abounded, Altman shared a cryptic message, posting on X: “i do have a soft spot for gpt2”. To add to the intrigue X users spotted that Sam had edited the tweet, initially posting “gpt-2”. With most of the AI community hammering the testing site, the bot was taken down, probably never to be seen in the wild again. There have been no official statements from OpenAI, but the consensus is that this may have been an attempt to get some early feedback, to harvest some choice prompts (Sam now has our brain scanner), and to build hype for the range of models they plan for 2024. Within that range, several people believe this might be an enhanced, but more compact GPT-4 level option, with some new planning capabilities, perhaps destined for free tier users later in the year, but not likely the full-strength GPT-5. Others believe the “gpt2” moniker might hint at this being a new ‘2nd generation’ LLM architecture, or a new product naming convention.

Takeaways: Agents that can plan reliably are the next big thing. We’re working on agentic solutions for clients, and we’ll be covering this topic in detail in future weeks, but for now it is pretty clear that there are some strong capabilities in the pipeline for 2024. AGI would be by most measures one of the most significant inventions in human history. It would however be preferable if this wasn’t being turned into a circus by the OpenAI team, who clearly delight in cryptic comms. Their commitment to shipping early and often, and letting the world adjust to the implications is laudable, but more transparency is needed. Let’s just hope they take their work more seriously behind the scenes and are indeed on the verge of delivering some major AI progress. A few brief prompts with a shadowy AI seem to suggest they may have something interesting waiting in the wings.

The state of AI regulation

President Biden issued his Executive Order on AI in October last year and has just released an update on progress against his 180-day deadlines. Top of the to-do list was “establish a framework for nucleic acid synthesis screening to help prevent the misuse of AI for engineering dangerous biological materials”, and thankfully in the US at least such screenings are now in place. But despite the order setting AI wide-ranging targets for the many US agencies, major legislative action is not yet forthcoming.

The Golden State isn’t waiting for Washington. Senate Bill 1047, recently proposed in California, aims to set clear standards for developers of the most advanced AI systems, requiring pre-deployment safety testing and certification, ongoing monitoring, and giving the Attorney General power to hold negligent developers accountable. It’s a bold move for a state that’s at the centre of the AI revolution. Giants like Google, Meta and OpenAI call the state home, as do a range of ambitious startups. And as might be expected SB 1047 like any talk of AI regulation is dividing opinion.

Leading researchers such as Geoffrey Hinton and Yoshua Bengio have spoken out in favour, seeing it as a way to mandate responsible development. Open-source advocates and smaller players are alarmed. Jeremy Howard, a prominent researcher and entrepreneur, argues that the bill’s broad definition of “covered model” could impact well-intentioned, small-scale developers working on beneficial AI projects.

Bigger firms could also be impacted. A key complexity is how the bill handles what re termed “derivative models”. The recently launched Lama 3 model from meta has already been re-trained or ‘fine-tuned’ by hundreds of developers in the last few weeks. The bill would place Meta, the original developer, on the hook for variants developed from their work, even if it was radically different, dangerous, or toxic.

Another challenge is the classification of ‘frontier’ models. They are defined by setting a compute threshold, but measuring and verifying compute used is tricky. There are different precisions and architectures that make a single threshold meaningless. And does putting all the onus on the original lab even make sense? It’s a bit like expecting MIT to be responsible for everything its ultra-smart alumni end up doing. Good luck with that.

Perhaps hardening the wider world is a better bet than trying to pre-empt every way a powerful AI could be misused or go rogue. That’s where efforts like NIST’s GenAI initiative comes in. Controversy around NIST appointments aside, this new risk framework is laser focused on the here and now, providing tools to spot AI-generated disinformation for example.

But some kind of regulation will be inevitable. Recent reports suggest that despite the promises made by the big AI firms last year, only Google have submitted their latest systems for pre-release testing by the UK’s AI Safety Institute. Big tech likely can’t be trusted to police itself with so much investment at stake. Even the regulation wary UK government is considering drafting new safety legislation.

With the next global AI Safety Summit later this month in South Korea, the futility of safety regulation is causing a shift from doomsday scenarios to resource and environmental challenges. Key players are getting summit fatigue with some suggesting they won’t attend – reaching any kind of global consensus to pre-empt a risk event looks unlikely.

Takeaways: Don’t wait for politicians, tech firms, open-source advocates, (or lawyers) to work this one out. The myriad AI systems available today are not going to be impacted by any frontier regulation, and they’re already capable of providing huge value for several years ahead. Take a proactive approach to AI governance within your organization. Develop clear policies, deploy the latest governance approaches, and continue to harden your human and digital infrastructure. Check the NIST AI risk management framework published this week for a comprehensive list of actions.

JOOST

The case for a Chief AI officer, or not….

The debate over the need for a Chief AI Officer (CAIO) continues this week, as companies like Accenture and Avanade promote this role, and as previously reported, US government agencies are mandated to appoint CAIOs by year end. However, the questions arises whether this role is the key to successful AI integration or just another corporate trend?

The push for CAIOs is driven by the rapid growth of AI and its potential to transform businesses across industries. Accenture alone has already sold $600 million in new AI business in 2024, so no wonder they are pushing for this new role (to sell into). Microsoft recently provided a platform to Avenda (their JV with Accenture) to promote the same, yet supplemented it by sharing four strategies for accelerating AI value creation, including a focus on adoption, top-down strategic objectives, making AI non-negotiable, and being strategic about the AI ecosystem. While these strategies could certainly be part of a roadmap for AI integration, the question remains: would it necessarily require a single person or function to oversee the embedding of AI?

There are parallels between the current CAIO trend and the rise of Chief Digital Officers (CDOs) more than a decade ago. CDOs often struggled to find their place within organizations, as digital transformation was never just about technology, but rather a combination of changes to product distribution, client servicing, and operational efficiency. The concern is that CAIOs may face similar challenges, becoming a new silo rather than a catalyst and towards pushing AI solutions that don’t match up with actual business problems.

So, what are the alternatives to appointing a CAIO? The forward thinking alternative – for those who expect this journey to be exponential –  would go as far as suggesting that AI itself could serve best as the CAIO, as it would utilise advanced technology and intelligence to guide AI strategy and implementation. This step-change concept is likely 1 or 2 years ahead of its time and requires both increased technology maturity and a human acceptance of a new leadership approach. This will not be achieved overnight and certainly is not recommended as a Marketing gimmick as Genius Group demonstrated by calling their AI chatbot CAIO “Alan Turing”. Another option is to assign CAIO responsibilities to an existing executive, such as the COO, CIO, or Chief Digital/Data Officer.

Perhaps the most compelling approach is to embed AI competency and strategy in every aspect of the leadership team. By supplementing CxOs with fractional and targeted AI leadership input, as offered by companies like ExoBrain, organisations can stay informed about AI developments and make smart strategic decisions on an as-needed basis.

Takeaways: Rather than rushing to appoint a CAIO, organisations should focus on fostering a culture of AI competency and strategic thinking across all levels of leadership. By leveraging external experts from many sources, and empowering existing executives to experiment with and then champion AI initiatives, companies can more effectively navigate a complicated landscape.

ExoBrain symbol

EXO

This week’s news highlights the rapid advancements in AI across various industries, from healthcare and finance to hardware and governance. The transformative impact of large language models, the growing debates around responsible AI development, and the surge in AI-driven research and innovation are central themes.

AI business news

AI governance news

AI research news

AI hardware news

Week 29 news

Language models do the math, MA(AI)GA, and intelligence too cheap to meter?

Week 28 news

Bursting the bubble narrative, reimagining public sector productivity, and the age of reason

Week 27 news

A tale of two elections, agents untethered, and the art of conversation

Week 26 news

Claude 3.5 Sonnet hits the high notes, the rise of the AI engineer, and Figma’s new creative toolkit