Will the new transatlantic institutional collaboration keep us safe?

This week the EU, UK and US announced a new partnerships on AI testing, with their respective AI Safety Institutes. The partnership aims to advance international scientific knowledge of frontier AI models and facilitate sociotechnical policy alignment on AI safety and security. Sounds great? It is as far as it goes, but the since the much vaunted Bletchley Park global safety summit last year, the progress on concrete safety measures has only inched forward. The fundamental problem is the opacity and scale of LLMs… with trillions of virtual neurones, the truth is nobody really knows how they work.

Anthropic (the trainers of Claude) has devised an ‘AI Safety Level’ scheme called ASL. Claude 3 is deemed ASL-2 and I quote: “shows early signs of dangerous capabilities—for example, the ability to give instructions on how to build bioweapons—but where the information is not yet useful due to insufficient reliability or not providing information that, e.g., a search engine couldn’t. Current LLMs, including Claude, appear to be ASL-2.” Note the words verbatim from their documentation; “appear to be”. Claude 3 Opus is exhibiting unique self-reflective behaviour that ExoBrain and other organisations have been at the forefront of documenting. Right now, we see no evidence of deceptive acts, but we can’t be certain. This year we may reach ASL-3; “systems that substantially increase the risk of catastrophic misuse compared to non-AI baselines (e.g., search engines or textbooks) or show low-level autonomous capabilities.”

A study also out this week indicated that whilst safety efforts are growing fast, they’re still a 2% drop in the research ocean. The pay disparity is no doubt a factor. A research scientist role at the UK institute is currently advertised with a package of £85-135K, not bad. But a capability research role with a big AI lab would net you between £230k-£350k a year according to current postings. Top engineers and researchers are offered £1m+.

Despite the big salaries, the labs are at a loss to explain how we safely adopt their inventions. This week OpenAI notified the world that they had essentially perfected AI voice duplication technology, with models able to learn from just 15 seconds of audio. What they expect the world to do to manage these capabilities is anybody’s guess.

Takeaways: Take any talk of AI safety you hear (evaluations, assurance, and red teaming) with a big pinch of salt. This is not aircraft safety, where we know how planes fly and can engineer them accordingly. Nobody knows why the models are able to do what they do. At ExoBrain we believe that AI can’t be made fully safe in the lab. We need to harden system design, organisations, and society through ever more robust real-world implementation projects. When adopting AI, its down to us at the business-end to keep vigilant, do the safety testing in-situ, and deploy thoughtfully.

Will the new transatlantic institutional collaboration keep us safe?

The Pentagon goes to war with Anthropic

Mutually assured AI malfunction

AI safety teams face the axe

Project 2025 AI analysis

Subscribe to the ExoBrain Weekly Newsletter