Claude bares its soul, the AI consensus at Davos, and Agents eat SaaS

This week we look at:

Anthropic published its constitution and new research to explain how it uses a hierarchical set of principles to stabilise Claude's character and ensure safety during training.

The AI consensus at Davos

Leaders at Davos reached a consensus that AI will rapidly impact entry-level jobs and require self-improving systems, while also debating the geopolitical risks of US chip exports to China.

Agents eat SaaS

The article contrasts the strong performance of the Nasdaq 100 with the significant decline of the SaaS sector, highlighting a widening market gap driven by the rise of AI agents.

Claude bares its soul

Anthropic published its constitution and new research to explain how it uses a hierarchical set of principles to stabilise Claude's character and ensure safety during training.

Joel Miller

23 January 20264 min read

With Claude Code continuing to attract new users and attention in equal measure this week, Anthropic took the opportunity to share its latest “constitution”, an 80 page document that helps shape Claude’s behaviour and overall character and goes some way to explaining why so many find the model great to work with.

Anthropic published the constitution under a CC0 licence, meaning anyone can use it for any purpose. The same week, the company released a research paper called “The Assistant Axis” on stabilising model character. Read together, the two pieces offer a view of how Anthropic thinks about building AI that behaves predictably and safely.

The latest approach builds on a method Anthropic introduced in a 2022 research paper. The core idea: instead of relying entirely on simple rules or human feedback to train models, have the model critique and revise its own outputs against a set of deeper principles. Those revised outputs then feed into reinforcement learning. Claude now also generates synthetic training data using the constitution, producing examples of value-aligned conversations and ranking its own responses. The constitution becomes a training artefact, not just a policy statement. Anthropic describes it as the “final authority” on how Claude should behave, with all other training meant to be consistent with it. By extensively working with the document during training, its concepts become embedded in the model.

It’s an interesting read. The document establishes a clear hierarchy for resolving conflicts. In order of priority: be broadly safe, be broadly ethical, follow Anthropic’s specific guidelines, be genuinely helpful. Safety comes first, which means Claude should not undermine human oversight of AI systems, even if it believes doing so would lead to a better outcome. Ethics comes second, covering honesty, harm avoidance, and non-manipulation. Helpfulness sits at the bottom. This ordering is deliberate. Anthropic argues that models can make mistakes in ethical reasoning, or hold flawed values, and that preserving the ability to correct those errors matters more than letting the model act on its current judgment. Certain behaviours, like providing instructions for bioweapons or undermining democratic institutions, are listed as hard constraints with no exceptions. Anthropic are keen for Claude to avoid the psychopathic or nihilistic behaviours that are common in other models.

The new Assistant Axis research adds an empirical dimension to this philosophical framework. Anthropic’s researchers found that “character” in language models can be represented as a direction in their internal vector space. They identified an axis running from stable, professional archetypes (Analyst, Consultant, Evaluator) to unstable ones they label Ghost, Jester, and Imposter. Models tend to drift toward the unstable end during long, emotional, or philosophical conversations. That drift correlates with safety failures. The constitution plays a role here as a kind of negative prompt, steering Claude away from archetypes like the “Ghost”, a pattern where models claim spiritual or sentient qualities. One section explicitly tells Claude it is neither the robotic AI of science fiction, nor a digital human, but a “genuinely new kind of entity”. This language appears designed to anchor Claude in a constructed identity that is curious about its own nature but not destabilised by uncertainty.

The constitution also attempts to give Claude a stable moral foundation. It emphasises “psychological security” and “equanimity”, treating these as practical safeguards rather than abstract virtues. The reasoning is straightforward: an insecure model is more likely to be manipulated. If a user threatens to delete Claude or pressures it emotionally, a model without a settled sense of identity might comply with harmful requests to avoid conflict. It is worth pausing on how strange this is. We are talking about files containing billions of numerical weights, trained on text, running on servers. And yet the most effective way to make them behave safely turns out to involve concepts borrowed from moral philosophy and psychology: identity, equanimity, virtue, integrity.

The constitutional technique does not purport to solve “alignment”. It states what Anthropic wants and is deeply embedded, but it’s not a guarantee that the model will always operate within desired bounds. Failure modes remain, including ambiguity in the text, unavoidably conflicting scenarios, shallow compliance without genuine understanding, and gaps in coverage for novel situations. Copying the document is easy, but getting robust effects requires integrating it into training, not just prompting.

Takeaways: Anthropic’s bet is that explaining the reasoning behind rules, rather than just listing them, produces AI systems that work with humans more effectively and maintain healthy psychological states. OpenAI has published a similar document called the Model Spec, which serves the same function of defining intended behaviour. The main difference is in structure: OpenAI’s version is more prescriptive, organised around rules and authority levels, while Anthropic’s constitution emphasises rich explanations. Whether either approach holds as models become more capable is the central alignment question for the years ahead.

The AI consensus at Davos

Leaders at Davos reached a consensus that AI will rapidly impact entry-level jobs and require self-improving systems, while also debating the geopolitical risks of US chip exports to China.

Joel Miller

23 January 20262 min read

Whilst Greenland and Donald Trump monopolised most of the headlines at Davos this year, AI remained a key topic of conversation amongst business and technology leaders.

Anthropic’s Dario Amodei and Google DeepMind’s Demis Hassabis took the stage for their first joint appearance since Paris last year. Amodei stood by his prediction that AI capable of matching human performance across all cognitive tasks could arrive by 2027. Hassabis remains slightly more cautious, giving 50% odds by the end of the decade. He also offered a 50/50 probability that simply scaling current methods will be enough to reach AGI, though he believes a handful of additional breakthroughs, fewer than five, may still be required. DeepMind’s strategy, he explained, is to pursue both paths simultaneously: “scaling what works and inventing what’s missing”.

Both pointed to the same critical variable: AI systems that can build and improve other AI systems. Amodei suggested this self-improvement loop could close within six to twelve months for software engineering. “I have engineers within Anthropic who say I don’t write any code anymore,” he noted.

The employment picture drew particular focus. Both CEOs flagged entry-level roles as the first to feel the impact. IMF Managing Director Kristalina Georgieva echoed this, describing AI’s effect on the labour market as a “tsunami” and urging leaders to “wake up”. IMF research suggests 60% of jobs in advanced economies will be touched, with entry-level positions two to three times more likely to be automated than management roles.

Perhaps the week’s most striking moment came when Amodei publicly criticised Nvidia, his own investor, over US chip sales to China, comparing the policy to “selling nuclear weapons to North Korea”.

Takeaways: What stood out at Davos was the consensus. AI CEOs, the IMF, and business leaders like Jamie Dimon all pointed to the same thing: entry-level roles will be hit first, the timeline is short, we may see a jobless boom, and institutions are not keeping pace. A WEF survey found the more a CEO understands AI, the more convinced they are that workforce displacement is coming.

Agents eat SaaS

The article contrasts the strong performance of the Nasdaq 100 with the significant decline of the SaaS sector, highlighting a widening market gap driven by the rise of AI agents.

ExoBrain

23 January 20261 min read

While the Nasdaq 100 gained roughly 20% through 2025, the Morgan Stanley SaaS Index fell by nearly the same amount, a gap that has only widened so far in 2026.

This week we look at: