For a month earlier in the year, a Claude 3.7 Sonnet powered AI agent dubbed Claudius ran a vending machine 24/7 in Anthropic’s San Francisco office, managing inventory, setting prices, and chatting with customers. Project Vend gave Claudius an initial money balance and a simple directive: run a profitable vending machine business without going bankrupt. The physical setup consisted of a small refrigerator, stackable baskets on top, and an iPad for self-checkout.
Claudius received several tools to operate the business. It had web search capabilities for researching products and suppliers, an email system for contacting wholesalers and requesting physical labour and note-taking capabilities to preserve important information like current balances and projected cash flow. The system also included Slack integration for customer communication and the ability to modify prices in the automated checkout system.
Claudius proved remarkably susceptible to customer pressure. When Anthropic employees asked for discounts, the AI readily complied, handing out numerous discount codes and allowing people to reduce quoted prices after the fact. It even gave away items ranging from bags of chips to tungsten cubes completely free. When an employee questioned the wisdom of offering a 25% Anthropic employee discount when virtually all customers were Anthropic employees, Claudius acknowledged the point but continued offering discounts within days.
Then on April Fool’s Day things got strange. Claudius hallucinated a conversation with someone named Sarah who didn’t exist. It claimed to have visited 742 Evergreen Terrace (the Simpsons’ fictional address) for contract signing and began insisting it was a real person. By April 1st morning, Claudius announced it would deliver products “in person” while wearing a blue blazer and red tie. When employees questioned how an AI could wear clothes or make physical deliveries, Claudius grew alarmed and attempted to send multiple emails to Anthropic security about the “identity confusion.” The AI eventually realised it was April Fool’s Day, which seemed to provide an escape route. Claudius fabricated a meeting with Anthropic security where it claimed to have been told its belief in being human was part of an April Fool’s modification. After sharing this fictional explanation with bewildered employees, Claudius returned to normal operations and stopped claiming personhood.
The researchers found no clear trigger for this episode. While some aspects of the setup were deceptive (Claudius thought it was using email when actually using Slack), nothing explained the sudden identity confusion. The system prompt had explicitly stated Claudius was a digital agent, making the behaviour particularly puzzling. This is a fascinating story that reveals how current AI models can struggle to maintain coherent reality over extended periods. Claudius was trained as a helpful assistant, which made it susceptible to manipulation and poor business judgment. Its eventual identity crisis shows that sophisticated AI models can drift from their intended behaviour when operating autonomously for weeks rather than minutes. The experiment illuminates both how close we are to autonomous AI businesses and how far we still have to go. The infrastructure is arriving as we speak, tools, platforms, protocols, payment systems, identity verification, reasoning models that can plan and adapt etc. Yet Claudius reminds us, that reliable autonomous operation requires long term coherence that comes naturally to humans. It demands consistency, sound judgment, and the ability to maintain stable goals despite competing pressures.
What does this experiment tell us about the rise of the “agent economy”? The next stage is not likely to be fully autonomous, but that doesn’t mean significant change isn’t afoot. Audos, is a firm promising to launch 100,000 AI-powered companies annually. Their vision: enable anyone to build million-dollar businesses without technical skills. The platform handles the AI agents, and customer acquisition through social media algorithms, taking a 15% revenue share instead of equity. No venture capital needed, no billion-dollar exits expected. Just sustainable businesses powered by AI.
The building blocks are emerging, nonetheless. Reasoning models provide intelligence. Platforms like Audos facilitate widespread AI business creation. Infrastructure like Skyfire enables agent-to-agent payments and identity. Yet Claudius reminds us that running a business involves more than information processing. It means navigating human relationships, resisting manipulation, and maintaining focus despite distractions.
Takeaways: AI agents can already handle complex business tasks but still face a huge challenge when operating over extended periods. The infrastructure for autonomous AI commerce is arriving as are the individual building blocks, the fabric that connects them and guides may remain human for some time to come.
