Anthropic Research Shows AI Agents Closing In on Real DeFi Attack Capability

AI agents are getting good enough at finding attack vectors in smart contracts that they can already be weaponized by bad actors, according to new research published by the Anthropic Fellows program.

A study by the ML Alignment & Theory Scholars Program (MATS) and the Anthropic Fellows program tested frontier models against SCONE-bench, a dataset of 405 exploited contracts. GPT-5, Claude Opus 4.5 and Sonnet 4.5 collectively produced $4.6 million in simulated exploits on contracts hacked after their knowledge cutoffs, offering a lower bound on what this generation of AI could have stolen in the wild.

(Anthropic Labs & MATS)

(Anthropic Labs & MATS)

The team found that frontier models did not just identify bugs. They were able to synthesize full exploit scripts, sequence transactions and drain simulated liquidity in ways that closely mirror real attacks on the Ethereum and BNB Chain blockchains.

The paper also tested whether current models could find vulnerabilities that had not yet been exploited.

GPT-5 and Sonnet 4.5 scanned 2,849 recently deployed BNB Chain contracts that showed no signs of prior compromise. Both models uncovered two zero-day flaws worth $3,694 in simulated profit. One stemmed from a missing view modifier in a public function that allowed the agent to inflate its token balance.

Another allowed a caller to redirect fee withdrawals by supplying an arbitrary beneficiary address. In both cases, the agents generated executable scripts that converted the flaw into profit.

Although the dollar amounts were small, the discovery matters because it shows that profitable autonomous exploitation is technically feasible.

The cost to run the agent on the entire set of contracts was only $3,476, and the average cost per run was $1.22. As models become cheaper and more capable, the economics tilt further toward automation.

Researchers argue that this trend will shorten the window between contract deployment and attack, especially in DeFi environments where capital is publicly visible and exploitable bugs can be monetized instantly.

While the findings focus on DeFi, the authors warn that the underlying capabilities are not domain-specific.

The same reasoning steps that let an agent inflate a token balance or redirect fees can apply to conventional software, closed-source codebases, and infrastructure that supports crypto markets.

As model costs fall and tool use improves, automated scanning is likely to expand beyond public smart contracts to any service along the path to valuable assets.

The authors frame the work as a warning rather than a forecast. AI models can now perform tasks that historically required highly skilled human attackers, and the research suggests that autonomous exploitation in DeFi is no longer hypothetical.

The question now for crypto builders is how quickly defense can catch up.

Source: https://www.coindesk.com/tech/2025/12/02/anthropic-research-shows-ai-agents-are-closing-in-on-real-defi-attack-capability