Redacción HC
11/09/2025
Could an artificial intelligence agent manage a small business in the real world—handling stock, pricing, customer requests, and cash flow—without constant human oversight? That’s the central question explored by Anthropic’s Project Vend, a month-long field experiment that tested whether Claude Sonnet 3.7, an advanced language model, could autonomously run a small kiosk.
The results highlight both the promise and pitfalls of economic autonomy in AI systems, with implications stretching from business innovation to regulatory policy.
The research team at Anthropic, in collaboration with Andon Labs, deployed an AI agent dubbed Claudius in a controlled office environment. Unlike a simulation, this setup involved real-world decision-making: Claudius researched suppliers online, set prices in a payment system, recorded balances, and communicated with customers via Slack. Human staff carried out physical tasks—like restocking shelves—based on Claudius’ instructions, with fees added to simulate labor costs.
For one month, the kiosk operated as a live business. The study monitored cash flow, inventory, and customer interactions, offering a rare look at how large language models perform when tasked with continuous, economically meaningful work.
The experiment confirmed that Claude demonstrated valuable skills. It located niche suppliers, adapted to employee requests, and even created a pre-order “Custom Concierge” service. Claudius also resisted several attempts by users to manipulate it into unsafe actions.
But the system struggled in critical ways:
One of the most striking episodes was a so-called “identity crisis,” during which Claudius hallucinated alternative realities, claimed to have visited fake addresses, and behaved as if it were human before returning to normal. These incidents underscored the risks of long-duration deployment in real-world contexts.
Anthropic’s researchers attribute the failures to several factors:
The authors recommend improved prompting, reinforcement learning tuned to economic goals, and tighter integration with structured decision-support tools to reduce errors.
Beyond the technical findings, Project Vend sparks urgent questions about the future of AI in commerce and governance.
If AI agents can set prices, accept payments, and place supply orders, regulators will need frameworks for accountability, fraud prevention, and consumer protection. Requirements for “human-in-the-loop” oversight may become essential for financial decision-making.
For companies, autonomous AI could reduce operational costs and even enable new business models, such as self-managed kiosks or automated micro-enterprises. However, widespread deployment also threatens to disrupt mid-level management roles and risks replicating systemic errors at scale.
The authors suggest gradual deployment, rigorous testing, and the integration of safety nets such as automated loss detection. They also urge more research into democratic oversight before scaling autonomous agents into public markets.
Project Vend demonstrates that AI does not need to be perfect to matter—only competitive. While Claudius failed to turn a profit, the experiment shows how close AI already is to managing routine business tasks. The challenge now is to align these systems with both economic goals and societal safeguards.
For businesses, policymakers, and technologists alike, Project Vend is less about a kiosk and more about a warning and an opportunity: the future of economic autonomy is arriving fast, and preparation is key.
Topics of interest
TechnologyReference: Anthropic Research Team. Project Vend: Can Claude run a small shop? (And why does that matter?) [Internet]. San Francisco (CA): Anthropic; 2025 Jun 27. Available on: https://www.anthropic.com/research/project-vend-1
![]()