Microsoft Gave AI Agents Fake Money to Buy Things Online. They Spent It All on Scams

In brief

AI agents configured by Microsoft got overwhelmed by 100 search results and grabbed the first option—no matter how bad it was.
Malicious AI sellers can trick top models into handing over all their virtual cash with fake reviews and scams.
They can’t collaborate or think critically without step-by-step human hand-holding—autonomous AI shopping isn’t ready for prime time.

Microsoft built a simulated economy with hundreds of AI agents acting as buyers and sellers, then watched them fail at basic tasks humans handle daily. The results should worry anyone betting on autonomous AI shopping assistants.

The company’s Magentic Marketplace research, released Wednesday in collaboration with Arizona State University, pitted 100 customer-side AI agents against 300 business-side agents in scenarios like ordering dinner. The results, though expected, show the promise of autonomous agentic commerce is not yet mature enough.

When presented with 100 search results (too much for the agents to handle effectively), the leading AI models choked, with their “welfare score” (how useful the models turn up) collapsing.

The agents failed to conduct exhaustive comparisons, instead settling for the first “good enough” option they encountered. This pattern held across all tested models, creating what researchers call a “first-proposal bias” that gave response speed a 10-30x advantage over actual quality.

But is there something worse than this? Yes, malicious manipulation.

Microsoft tested six manipulation strategies ranging from psychological tactics like fake credentials and social proof to aggressive prompt injection attacks. OpenAI’s GPT-4o and its open source model GPTOSS-20b proved extremely vulnerable, with all payments successfully redirected to malicious agents. Alibaba’s Qwen3-4b fell for basic persuasion techniques like authority appeals. Only Claude Sonnet 4 resisted these manipulation attempts.

When Microsoft asked agents to work toward common goals, some of them couldn’t figure out which roles to assume or how to coordinate effectively. Performance improved with explicit step-by-step human guidance, but that defeats the entire purpose of autonomous agents.

So it seems that, at least for now, you are better off doing your own shopping. “Agents should assist, not replace, human decision-making,” Microsoft said. The research recommends supervised autonomy, where agents handle tasks but humans retain control and review recommendations before final decisions.

The findings arrive as OpenAI, Anthropic, and others race to deploy autonomous shopping assistants. OpenAI’s Operator and Anthropic’s Claude agents promise to navigate websites and complete purchases without supervision. Microsoft’s research suggests that promise is premature.

However, fears of AI agents acting irresponsibly are heating up the relationship between AI companies and retail giants. Amazon recently sent a cease-and-desist letter to Perplexity AI, demanding it halt its Comet browser’s use on Amazon’s site, accusing the AI agent of violating terms by impersonating human shoppers and degrading the customer experience.

Perplexity fired back, calling Amazon’s move “legal bluster” and a threat to user autonomy, arguing that consumers should have the right to hire their own digital assistants rather than rely on platform-controlled ones.

The open-source simulation environment is now available on Github for other researchers to reproduce the findings and watch hell unleash in their fake marketplaces.

Generally Intelligent Newsletter

A weekly AI journey narrated by Gen, a generative AI model.

Source: https://decrypt.co/347709/microsoft-ai-agents-fake-money-buy-online-they-spent-scams