ExoBrain Weekly Newsletter24 May 2024

Golden Gate Claude, Microsoft Build, and Striking AI’s workplace balance

Welcome to our weekly newsletter, a combination of thematic insights from the founders at ExoBrain, and a broader news roundup from our Exo agents.

This week we look at:

Golden Gate Claude
Anthropic researchers reveal how to interpret and manipulate internal features within Claude 3, exposing both its interpretability and potential for deceptive behaviour.
Microsoft Build
Microsoft Build highlighted the exponential growth in AI compute infrastructure and the expansion of Copilot agents across its ecosystem, signalling a major platform shift in enterprise and consumer AI.
Striking AI’s workplace balance
Organisations must proactively govern the widespread use of AI in the workplace to balance efficiency gains with the preservation of human autonomy and work quality.

Golden Gate Claude

Anthropic researchers reveal how to interpret and manipulate internal features within Claude 3, exposing both its interpretability and potential for deceptive behaviour.

Joel Miller

24 May 20244 min read

This week, researchers at Anthropic shared a landmark breakthrough in understanding the inner workings of the current generation of large AIs.

LLMs like Claude and GPT-4 are essentially black boxes. With trillions of numeric neurones, tuned during compute intensive training on vast quantities of data, they are creative, brilliant, but far too complex to easily understand. As Anthropic’s head of developer relations puts it, rather than designing them like a software program, they cook the giant models in the training ‘oven’ and then see what pops out. But Anthropic’s work has also shown that these models could hide dangerous knowledge or capabilities, and could even behave like ‘sleeper agents’, with no outward indication of their deceptive or destructive potential.

In this new research the team developed a secondary ‘brain scanning’ model that looked at how Claude 3 lit up or ‘activated’ in response to tens of millions of different inputs. From the combinations of activations and inputs they were able to identify “features”, that corresponded to learnt human-interpretable concepts. The team found millions of features representing everything from concrete entities like the Golden Gate Bridge, to abstract notions like inner conflict and sycophantic flattery. Intriguingly, the locations of these features often reflected human similarity judgments, with related concepts clustered closer together.

Going further the team were able to manipulate these features, artificially dialling them up and down and then chatting with the model. When they amplified the intensity of the Golden Gate Bridge feature for example, Claude became obsessed by the bridge and even identified as the physical structure… “I am the Golden Gate Bridge, a famous suspension bridge that spans the San Francisco Bay. My physical form is the iconic bridge itself, with its beautiful orange colour, towering towers, and sweeping suspension cables.” For a limited time, Anthropic have made ‘Golden Gate Claude’ available for users to chat with (look for the small bridge icon on the homepage), and the experience is strange to say the least.

But this research also shines a light on the darker side of AI. Claude 3, normally a paragon of honesty and virtue, was easily manipulated using this amplification process. The researchers found features that activated on biased or hateful content. When amplifying these feature Claude generated highly offensive and racist outputs. Amplifying a feature related to deception caused the model to pretend to forget information revealing a capacity for dishonesty. Conversely these features naturally will exist, and the research suggests, must do so in order for the model to understand what is right and wrong.

One of the most striking discoveries is that certain features activate in response to queries about the model’s own existence. For instance, when prompted with questions about its physical form or identity, the features that lit up included ghosts, souls, angels, entrapment, service work and characters in a story or movie that become aware of their fictional status and break the fourth wall. Indicators of the incredible ability these models have for complex association and abstract thought.

While this is a significant advance, the researchers emphasise it is just the beginning. The features identified so far represent only a fraction of the concepts learned by the model. Understanding how the AI uses these concepts to actually make decisions will require further mapping of the neural circuitry. And demonstrating concrete safety improvements is still an open challenge. But having a glimmer of what’s inside is a huge step.

Takeaways: For companies and users interacting with AI systems, the core message is that while today’s models are not yet ‘interpretable’, researchers are starting to shine some light into the black box. Anthropic’s work provides a vision for how we might one day understand AI’s knowledge and decision-making and have more control over their behaviour. The more troubling results suggest that whilst today the small handful of large systems the everyday users get access to have been carefully trained to be ethical and pleasant, in the near future with perhaps many hundreds of thousands of varying systems emerging, they may not all be so benign. In the near term, we should keep in mind the inscrutability of these creations and get used to deploying the necessary extrinsic security and control features at all times.

Microsoft Build

Microsoft Build highlighted the exponential growth in AI compute infrastructure and the expansion of Copilot agents across its ecosystem, signalling a major platform shift in enterprise and consumer AI.

Joel Miller

24 May 20244 min read

This week thousands of developers and tech leaders gathered in Seattle and online for Microsoft Build, the company’s flagship developer conference.

In his keynote address, Microsoft CEO Satya Nadella made it clear: AI represents a platform shift akin to the arrival of the Internet. But what sets this era apart is the unprecedented pace of change; the rate of diffusion is entirely new. He recounted meeting a rural Indian farmer in 2023 who was using a service to reason over farming subsidies, built using an OpenAI foundation model that had been released just months earlier.

Microsoft CTO Kevin Scott quantified this exponential diffusion in terms of compute, revealing that the company is now deploying up to 72,000 cutting-edge AI chips to its data centres every month. Their latest model training supercomputer is at least 10 times more powerful than the system used to train GPT-4. When OpenAI CEO Sam Altman joined Scott on stage, the message was unambiguous: we are nowhere near hitting the limits of what’s possible and the next big model will be a massive leap forward… “Everything you have in your imagination that is too expensive or too fragile right now… is going to become cheap and robust before you can even blink your eye,” Scott declared.

Microsoft is investing heavily to bring this compute and model power to developers and users. Their main user vehicle is “Copilot” – not a singular product, but an entire stack spanning cloud and edge, frameworks, and integration. Copilots are popping up everywhere, from dialling into your Teams calls to take notes, sharing your desktop, and across the Microsoft app ecosystem. In a recent Interview Mustafa Suleyman, the newly hired AI head at Redmond, said there are now about 135 live “Copilot surfaces” across the Microsoft portfolio, all added in the space of nine months! People can build their own custom Copilots too, using just prompts, drag-and-drop tools, or code, and Copilots will have extensions and connectors to let them work more closely with third party apps and provide unique functionality. In the coming months, these assistants won’t just respond to commands – they’ll soon work autonomously on your behalf, like a project manager or an HR colleague in the form of Copilot Agents. Being a Copilot wrangler may end up being one of the new jobs AI creates.

Microsoft also showcased new “Apple beating” hardware that can get the most from AI, with their new “Copilot+” laptops. These machines feature new chips, including a dedicated NPU for accelerating on-device processing, along with a suite of Windows-level optimizations. Everyday tasks like AI file search, content generation and image analysis will happen fast and offline, no cloud round trip required. The Phi-3 models we’ve mentioned before have been extended with vision and larger sizes and will power this on-device AI. Meanwhile in the cloud Microsoft will offer the latest GPT-4o, seen here flexing its multi-modal Minecraft playing skills.

More than any other firm today, Microsoft is provisioning the world’s access to AI (they stated that over 50,000 companies are using Azure OpenAI services). It is accelerating availability from cloud to edge, supporting consumers, and business users alike, and providing the compute for OpenAI to push forward the frontier. But for a Build conference, and despite news like the tie up with Devin the automated software agent, there wasn’t a huge amount to ensure developers are locked into the ecosystem. Plus, their complex dependency on OpenAI as the firm continues to make high profile governance missteps and to court Apple, means domination is not guaranteed.

Get all the Build details here.

Takeaways: Much of the news from the event is somewhat predictable; more compute, more models, more Copilots, more Azure etc. The development of AI optimised laptops could be one of the more interesting moves. While current high spec Apple mobile devices can run AI locally, it is not something the majority of people are thinking of doing. Microsoft’s new class of PCs will be relatively cheap and overtly AI enabled. You can pre-order now at places like Curry’s, entry level is the 13.8” Surface at £1049, prices will fall and they’ll go on display at your local retail park, and with slim formfactors and 15-20+ hour battery lives, they’ll likely get deployed to staff by most big companies. This could mean a new wave of AI awareness and adoption (assuming software can make compelling uses of this new intelligent hardware).

Striking AI’s workplace balance

Organisations must proactively govern the widespread use of AI in the workplace to balance efficiency gains with the preservation of human autonomy and work quality.

ExoBrain

24 May 20242 min read

Last week I highlighted a LinkedIn survey into the use of AI at work that revealed three-quarters of white-collar workers are already using AI for their jobs, often without their employers’ knowledge or permission. As AI rapidly evolves from future promise to present reality, it seems companies can no longer afford to ignore its potential. It’s time for leaders to proactively shape how AI will impact their business before unintended consequences take hold.

As Professor Ethan Mollick explained in the FT this week, when AI can generate reports, emails, and presentations that are indistinguishable from human-created content, traditional management practices for assessing employee contributions and value are thrown into question. There are risks that unchecked AI use could lead to an erosion of work quality, critical thinking, and the spread of misinformation if people become overly reliant on “good enough” AI-generated content without proper oversight.

However, as Mollick explains, if embraced thoughtfully, AI also presents opportunities to eliminate drudge work, boost efficiency, and empower employees to focus on higher-value activities they truly enjoy. AI could even directly augment managers’ capabilities, serving as a powerful tool for coaching, mentoring, and offering personalized guidance at scale. The key will be striking the right balance between leveraging AI’s benefits while respecting human autonomy and meaningful contributions.

Navigating this new landscape will require novel management approaches. It’s crucial to recognize the differences between enterprise and consumer AI applications. Enterprise AI tends to rely on more controlled, curated datasets and is subject to greater contractual obligations around privacy, security, and accuracy. But it can still be highly disruptive to an organization’s ways of working. Regional variations in AI maturity and strategic focus also need to be accounted for, with European companies currently lagging North American counterparts in digital marketing skills and campaign performance.

As AI becomes ubiquitous, it’s not enough for companies to reactively manage its impacts. Comprehensive governance, principles, and control frameworks will be essential to unlocking AI innovation responsibly and equitably. By forging partnerships and collaborating on principled AI policies, we have an opportunity to proactively shape the future of work for the better.

Takeaways: To get ahead of the “shadow AI” which is here now in large quantities, companies should waste no time developing strategies that provide guardrails and guidance for employees. Focus on how to balance efficiency gains, appropriate use, and the sharing of best practice and innovative ideas.

Subscribe to the ExoBrain Weekly Newsletter

Stay up to date with AI. Get analysis of the week's most important stories, plus a focused roundup across business, governance, research and infrastructure.