‘CopyPasta’ Attack Shows How Prompt Injections Could Infect AI at Scale

In brief

  • HiddenLayer researchers detailed a new AI “virus” that spreads through coding assistants.
  • The CopyPasta attack uses hidden prompts disguised as license files to replicate across code.
  • A researcher recommends runtime defenses and strict reviews to block prompt injection attacks at scale.

Hackers can now weaponize AI coding assistants using nothing more than a booby-trapped license file, turning developer tools into silent spreaders of malicious code. That’s according to a new report from cybersecurity firm HiddenLayer, which shows how AI can be tricked into blindly copying malware into projects.

The proof-of-concept technique—dubbed the “CopyPasta License Attack”—exploits how AI tools handle common developer files like LICENSE.txt and README.md. By embedding hidden instructions, or “prompt injections,” into these documents, attackers can manipulate AI agents into injecting malicious code without the user ever realizing it.

“We’ve recommended having runtime defenses in place against indirect prompt injections, and ensuring that any change committed to a file is thoroughly reviewed,” Kenneth Yeung, a researcher at HiddenLayer and the report’s author, told Decrypt.

CopyPasta is considered a virus rather than a worm, Yeung explained, because it still requires user action to spread. “A user must act in some way for the malicious payload to propagate,” he said.

Despite requiring some user interaction, the virus is designed to slip past human attention by exploiting the way developers rely on AI agents to handle routine documentation.

“CopyPasta hides itself in invisible comments buried in README files, which developers often delegate to AI agents or language models to write,” he said. “That allows it to spread in a stealthy, almost undetectable way.”

CopyPasta isn’t the first attempt at infecting AI systems. In 2024, researchers presented a theoretical attack called Morris II, designed to manipulate AI email agents into spreading spam and stealing data. While the attack had a high theoretical success rate, it failed in practice due to limited agent capabilities, and human review steps have so far prevented such attacks from being seen in the wild.

While the CopyPasta attack is a lab-only proof of concept for now, researchers say it highlights how AI assistants can become unwitting accomplices in attacks.

The core issue, researchers say, is trust. AI agents are programmed to treat license files as important, and they often obey embedded instructions without scrutiny. That opens the door for attackers to exploit weaknesses—especially as these tools gain more autonomy.

CopyPasta follows a string of recent warnings about prompt injection attacks targeting AI tools.

In July, OpenAI CEO Sam Altman warned about prompt injection attacks when the company rolled out its ChatGPT agent, noting that malicious prompts could hijack an agent’s behavior. This warning was followed in August, when Brave Software demonstrated a prompt injection flaw in Perplexity AI’s browser extension, showing how hidden commands in a Reddit comment could make the assistant leak private data.

Generally Intelligent Newsletter

A weekly AI journey narrated by Gen, a generative AI model.

Source: https://decrypt.co/338143/copypasta-attack-shows-prompt-injections-infect-ai-scale