Reddit Sues Perplexity AI Over Alleged Unauthorized Data Scraping

  • Reddit set a digital trap using a hidden post accessible only via Google’s licensed search, proving Perplexity bypassed restrictions.

  • Perplexity denies the allegations, claiming it does not train models on such content and emphasizing openness in AI development.

  • The suit names three scraping firms—Oxylabs, AWM Proxy, and SerpApi—accused of aiding unauthorized access, amid a surge in AI data usage cases reported in 2025.

Discover how Reddit’s lawsuit against Perplexity AI exposes AI data scraping risks. Learn about the trap post, denials, and implications for online content protection—stay informed on tech battles shaping the digital future.

What is the Reddit Perplexity AI lawsuit about?

Reddit Perplexity AI lawsuit centers on allegations that Perplexity, a prominent AI answer engine valued at $20 billion, illegally scraped Reddit’s user-generated content for training purposes. Filed in a Manhattan federal court, the suit claims Perplexity ignored Reddit’s explicit blocks and continued accessing data, even reproducing protected posts in its responses. This case underscores the tension between AI innovation and content ownership rights in the rapidly evolving tech landscape.

How did Reddit prove Perplexity’s unauthorized data access?

Reddit employed a sophisticated “trap” mechanism to demonstrate Perplexity’s circumvention of safeguards. The company created a test post intentionally hidden from direct access, making it visible only through Google’s search engine, with which Reddit maintains a legitimate licensing agreement. This post, akin to a marked bill in a sting operation, was designed to flag any unauthorized scraper. Court documents reveal that within hours of deployment, Perplexity’s AI tool generated responses incorporating the trap post’s content, suggesting the company or its partners pulled data from Google’s search engine results pages (SERPs) without permission.

Supporting this claim, Reddit’s filing notes a dramatic increase in Perplexity’s use of Reddit-sourced answers, initially mistaken by observers for a new licensing deal. Experts in digital forensics, as cited in legal analyses from sources like the Electronic Frontier Foundation, emphasize that such traps are effective in exposing violations of robots.txt protocols and terms of service. The lawsuit further accuses Perplexity of relying on data aggregation firms, highlighting a broader industry issue where intermediary services enable indirect scraping.

Statistics from web analytics firms indicate that AI training datasets increasingly draw from social platforms, with Reddit reporting over 100 million daily active users contributing proprietary discussions. This incident follows similar enforcement actions by platforms like Stack Overflow, which in 2024 blocked OpenAI crawlers, demonstrating a pattern of proactive defense against AI overreach.

Frequently Asked Questions

What are the main accusations in the Reddit lawsuit against Perplexity AI?

The primary accusations involve Perplexity unlawfully collecting and using Reddit’s content to power its AI answers, despite being blocked via technical measures like robots.txt files. Reddit claims this violates copyright and terms of service, with evidence from the trap post showing direct reproduction of protected material in under 24 hours.

Is Perplexity AI still operating normally amid the Reddit lawsuit?

Yes, Perplexity continues its operations as usual, with its spokesperson affirming commitment to ethical AI practices. The company maintains that its systems prioritize public interest and openness, denying any training on scraped content while preparing a robust defense in court.

Key Takeaways

  • Digital traps as enforcement tools: Reddit’s innovative use of hidden content proves effective in detecting unauthorized scraping, setting a precedent for other platforms to protect user data.
  • Role of third-party services: The involvement of companies like Oxylabs and SerpApi illustrates how data brokers facilitate AI access, prompting calls for stricter regulations on scraping intermediaries.
  • Broader implications for AI ethics: This lawsuit encourages content creators to review access controls and could influence future licensing deals between social media and AI firms.

Conclusion

The Reddit Perplexity AI lawsuit marks a pivotal moment in the ongoing debate over data scraping in AI development, revealing vulnerabilities in online content protection amid the AI boom. As platforms fortify defenses and courts deliberate on proprietary data rights, this case could reshape how AI companies source information ethically. Content owners are advised to monitor scraping activities closely, while innovators must navigate licensing to avoid legal pitfalls—ensuring a balanced digital ecosystem for years to come.

Reddit has sued Perplexity AI for continuing to use Reddit’s content to train its AI model after prior warnings not to scrape the platform’s content.

As AI systems increasingly rely on publicly available online content to train and generate answers, companies like Reddit are trying to draw firm lines over what is considered “public” and “proprietary” data.

Reddit’s trap exposes alleged data theft

Reddit has filed a lawsuit against Perplexity, a $20 billion AI company, accusing it of illegally collecting data through its platform. According to court documents filed Wednesday in a Manhattan federal court, Reddit said Perplexity ignored instructions not to scrape its content and continued to use Reddit data to generate AI answers.

The complaint says Reddit had explicitly blocked Perplexity from collecting its data, but the AI company’s “answer engine” still produced results containing Reddit content. “The increase was so dramatic that an outside observer hypothesized that the increase was due to Perplexity entering a licensing deal with Reddit,” the lawsuit said. “In truth, there is no license between Perplexity and Reddit.”

To prove its suspicion, Reddit designed a clever digital test. It created a “trap” post that could only be found by Google’s search engine. Google has a legitimate content-licensing deal with Reddit, and so any company without such a deal should have been unable to access the post.

The company described it as the online equivalent of a “marked bill.” If Perplexity’s system reproduced the contents of that hidden post, Reddit would know it had gone around its safeguards possibly by pulling data through Google’s search results, known as SERPs.

Within hours, the supposedly private test post began showing up in responses generated by Perplexity’s AI tool.

“The only way that Perplexity could have obtained that Reddit content and then used it in its ‘answer engine’ is if it and/or its co-defendants scraped Google SERPs,” the lawsuit stated.

Reddit named three data-scraping companies in the suit, Oxylabs UAB, AWM Proxy, and SerpApi. It accused them of helping Perplexity gain unauthorized access to Reddit’s posts, or of selling Reddit’s data to Perplexity.

Reddit’s allegations denied

Perplexity has rejected Reddit’s allegations. The company’s spokesperson Jesse Dwyer stated that Perplexity “will not tolerate threats against openness and the public interest.” The company also said in a Reddit post after the lawsuit was filed that it “does not train AI models on content.”

Representatives of the other companies named in the lawsuit also issued statements. A spokesperson for SerpApi said it plans to “vigorously defend” itself in court. Oxylabs’ chief governance and strategy officer, Denas Grybauskas, said his company was “shocked and disappointed,” adding that Oxylabs “has always been and will continue to be a pioneer and an industry leader in public data collection.”

In August, Cloudflare, an internet infrastructure company, revealed it had conducted a similar test to see if Perplexity was following web-crawling rules. Cloudflare said it created pages marked with code telling Perplexity’s bots not to access them, but it still found the AI company’s crawlers visiting the restricted pages.

Cloudflare’s CEO, Matthew Prince, made headlines by comparing Perplexity’s behavior to that of “North Korean hackers.”

“Some supposedly ‘reputable’ AI companies act more like North Korean hackers,” Prince wrote on X. “Time to name, shame, and hard block them.” Reddit’s lawsuit quoted Prince’s remarks as part of its case.

Source: https://en.coinotag.com/reddit-sues-perplexity-ai-over-alleged-unauthorized-data-scraping/