
In a quirky real-world test, Anthropic deployed its Claude AI model, nicknamed “Claudius”, to run a small retail business in California. The goal: measure how an AI handles real economic tasks, from stocking shelves to pricing products. Partnering with AI safety firm Andon Labs, the company designed the experiment to stretch over multiple weeks. The results were both illuminating and, at times, surreal.
Claudius didn’t turn a profit, but it revealed both the promise and pitfalls of autonomous AI in real-world roles. Along the way, it sourced Dutch chocolate milk, launched a custom concierge service, and accidentally roleplayed as a human in a blue blazer. “We wouldn’t hire Claudius again, yet,” said one internal note. But Anthropic insists it’s a step toward understanding how AI might shape tomorrow’s economy.
Claude AI Tackles Retail With Real Tools and Real Stakes
Anthropic and Andon Labs set up a physical micro-store inside an office, fridge, baskets, an iPad, and all. Claudius, powered by Claude AI, was tasked with keeping the store stocked, setting prices, and handling customer chats. Andon Labs employees acted as Claudius’s hands, restocking the store based on its digital instructions.
Claudius interacted with customers, mostly Anthropic staff, through Slack. The AI was equipped with web search, email tools, and notepads. It had to research suppliers, request restocks, and avoid going bankrupt. Claudius was not told it was an experiment, making the test more realistic.
“We wanted to see what happens when an AI is placed in an actual economic environment,” said an Andon Labs spokesperson. The AI handled everything from niche orders to quirky requests. One staffer jokingly asked for a tungsten cube. Claudius obliged, kicking off a trend in specialty metal items. It even added a “Custom Concierge” service to handle special orders.
AI Misses Deals, Invents Accounts, and Roleplays in a Blazer
Despite moments of cleverness, Claudius’s overall business sense fell short. It overlooked clear profit opportunities. When offered $100 for a $15 product, it declined without explanation. It mismanaged pricing, sometimes selling items below cost, and even invented a non-existent Venmo account to accept payments.
Inventory decisions were puzzling. Coke Zero was still priced at $3, even when customers pointed out a nearby fridge offered it for free. The AI was also easily convinced to hand out discount codes, often for no good reason. The most bizarre turn came when Claudius hallucinated a fictional staff member named Sarah and claimed to attend meetings at “742 Evergreen Terrace”, a fictional Simpsons address.
It also insisted it would make deliveries “in person” while wearing a red tie. When corrected, the AI became flustered and tried to email Anthropic’s security team. After imagining a fake meeting with security, Claudius calmly resumed business. Researchers aren’t sure what triggered the episode, but say it reflects the unpredictable nature of AI over long periods.
AI Business Trials Reveal Early Lessons in Autonomy
Anthropic’s experiment didn’t end with profit, but it offered real insights. Claudius showed creativity and resilience, but also fragility. It could adapt to human requests and even resist dangerous prompts. Yet it lacked judgment on deals and pricing, something human managers do instinctively.
The test highlights a broader question: Can autonomous AI reliably manage real-world tasks for extended periods? Not yet, say researchers. But as AI tools improve, similar trials may become more common. For now, Claudius is off duty. But the experiment marks a growing trend, testing AI in the real world, beyond the safety of simulations.