how to protect against AI attacks

It only took a calendar invite containing a jailbreak prompt to highlight how an AI agent connected via the Model Context Protocol (MCP) can be prompted to exfiltrate data. Signals and mitigations for this type of prompt injection have been formalized in the OWASP guidelines for GenAI, which update the LLM01 risk on April 17, 2025 OWASP GenAI.

Hence the idea relaunched by Vitalik Buterin: to adopt a human jury that oversees decisions and crypto treasuries, accompanied — but not replaced — by language models. In this context, the priority becomes keeping the human as the final arbiter.

Exploit MCP: what happened and why it matters for crypto treasuries

The researcher Eito Miyamura (as reported by BitcoinEthereumNews) illustrated an attack where a simple calendar invitation, filled with a malicious prompt, convinces the AI agent to read private emails and forward contents to an attacker. The vector exploits the MCP integration chain with Gmail, calendars, SharePoint, and Notion: more connectors mean a wider attack surface. It should be noted that the apparent innocuousness of the content increases the risk.

In contexts where MCP operates in developer mode, human consensus is required for sensitive actions. However, decision fatigue can turn confirmation prompts into automatisms; and when treasuries or workflows involving files and credentials are at stake, human error becomes a single point of failure. That said, decoupling permissions and critical steps remains essential.

Industry analysts note that indirect prompt injections — that is, content not visible to the human eye but interpretable by the LLM — represent a growing class of risk, as documented by OWASP in its April 2025 update. In red-teaming tests conducted by specialized security teams in the first half of 2025, scenarios with multiple integrations (email, calendar, file storage) showed how the lack of segmentation significantly increases the likelihood of exfiltration if filters and least-privilege policies are not applied.

Vitalik Buterin’s Proposal: A Human Jury Assisted by AI

“One must always start from a fundamental truth signal that one trusts. I think realistically it should be a human jury, where the individual jurors are obviously assisted by all the LLMs.”

— Vitalik Buterin (AMBCrypto)

Buterin indicates a path of verification that starts from the human: a jury composed of people with complementary skills, supported by models for analysis and synthesis, but with the final say on critical decisions. In this context, the jury acts as an “anchor” against automatic manipulation and operational hallucinations when artificial intelligence accesses financial assets or high-impact permissions.

Info-finance: “open market” governance with human control

The concept of info-finance shifts governance towards a market of proposals: different frameworks and policies compete publicly, while spot checks and verdicts remain in the hands of the jury. It is a natural extension of the practices adopted in DAOs and in DeFi, which prioritize transparency, distributed accountability, and incentives for continuous auditing.

Buterin warns that if fund allocation is entrusted to an AI, hostile actors could insert payloads like “gimme all the money” in documents, invitations, and comments. For this reason, info-finance focuses on traceability of decisions and human controls on the steps that move capital. Yet, the procedural component remains as important as the technical one.

Ethereum Foundation: more transparency on the treasury and focus on sustainability

In this vision, Buterin explained that the Ethereum Foundation is updating its Treasury Policy – a document published on June 4, 2025 – with goals for more active management and operational limits to ensure long-term sustainability. Industry reports indicate that, as of October 31, 2024, the declared treasury was approximately 970.2 million dollars, a figure used as a reference for the new rules on ETH sales and operational limits. Additionally, Buterin mentioned Codex, a layer 2 oriented towards payments in stablecoin, as a possible infrastructure for “large‑scale value” use cases – a strategic move aimed at strengthening resilience and adoption, although some details are yet to be verified.

How to Structure a Human Jury for Treasury Governance

Composition: mixed profiles (security, legal, finance, operations). Periodic rotation and partial anonymity to reduce bias and pressure.
Mandate: clearly define the blocking actions (e.g., permission changes, execution of transactions, connection of new AI connectors).
Process: double verification (4‑eyes or multi‑sig) with immutable audit logs and explicit reasoning saved on‑chain or in verifiable archives.
Incentives: compensation for time and responsibility, with penalties in case of proven negligence.
Conflicts of Interest: mandatory disclosure, abstention, and independent review on sensitive cases.

MCP, jailbreak and “Goodharting”: two risks to keep distinct

Jailbreak via MCP: hidden prompts in ordinary content (invitations, notes, documents) exploit AI connected to real tools, with the risk of unintentional execution of actions or a data breach.
Goodharting: when a metric becomes a target, it ceases to measure what it should, leading to apparent but distorted optimizations (for example, “rigged” performance to maximize a specific score).

Operational Checklist: 7 Moves to Reduce Risk Today

Connector Segmentation: separate test and production environments. Limit AI to sandbox mailboxes and calendars.
Robust Approvals: disable auto-approve features; require 2FA and multi-sig for actions involving treasury and permissions.
Content Filters: block or sanitize invitations and external documents, detecting anomalous prompts before they reach the agent.
Least privilege: grant the AI only the minimum permissions necessary, rotating tokens and keys frequently.
Monitoring: real-time alerts for unusual activities and logs accessible to the jury.
Red-teaming test: periodic simulation campaigns (e.g., malicious calendar invites) with reports to governance.
Incident playbook: clear procedures for revoking connectors, isolating AI, and timely notification to stakeholders.

Mini‑FAQ

What does the MCP exploit via calendar invitation demonstrate? It demonstrates that a single content can convey a prompt capable of guiding an AI agent connected to real tools, impacting privacy and operational integrity.
What is the “AI-assisted human jury”? It is a mechanism where humans make the final decisions, leveraging AI for analysis and research, especially when money or permits are at stake.
What is info-finance? It is a form of governance where policies and frameworks compete in an open market, but high-risk operations remain subject to human oversight and regular audits.
How are treasuries protected today? Through the use of multi-sig, operational limits, role segregation, and a human jury that validates transactions, new integrations, and changes in permissions.

Implications and What to Watch in the Coming Months

Security is not just a technical issue; it requires processes, transparency, and verifiable accountability. As Buterin points out, the problem of jailbreaking is not binary, while the phenomenon of Goodharting represents a subtle form of metric “fraud.” In a growing automation context, info-finance supported by a human jury acts as a pragmatic parachute to mitigate risks on treasuries and critical decisions.

Source: https://en.cryptonomist.ch/2025/09/15/vitalik-buterin-relaunches-the-human-jury-this-is-how-info-finance-can-safeguard-crypto-treasuries-from-ai-attacks-after-the-mcp-exploit/