AI Chatbots Accused of Illegal Data Scraping, Harming News Outlets

The News Media Alliance (NMA) has accused artificial intelligence (AI) firms of engaging in illegal data scraping activities to train their large language models (LLMs). In a 77-page white paper and accompanying documents submitted to the United States Copyright Office, the NMA highlighted the disturbing trend of copyright violation by AI chatbots, emphasizing that a significant portion of the data used to train these AI models is sourced from copyrighted news publications.

Copyright violation by AI chatbots

The NMA’s allegations revolve around the unauthorized use of copyrighted content by AI developers. They contend that AI chatbots are not only illegally obtaining data but are also competing directly with news outlets by providing “narrative answers to search queries.” This direct competition, they argue, diverts consumers away from news sources, ultimately impacting the revenues of news outlets.

The NMA’s submission underscores that AI developers are reaping substantial profits without bearing the risks associated with news reporting. This situation, as described in the report, is anomalous, and it points fingers at prominent generative AI models such as Bing Chat, Bard, Claude, and ChatGPT for allegedly breaching the copyrights of news publishers.

In the words of the NMA, “The members of the News/Media Alliance are deeply concerned about this unauthorized and unlawful use of their expressive content by large technology companies. Such companies do not shoulder the cost or risk of reporting the news or producing creative content but capitalize on that valuable work.”

Profitable AI ventures and rising valuations

The NMA also highlights the soaring valuations of leading AI developers who have profited from using unauthorized third-party content. Companies like OpenAI and Anthropic have witnessed their market capitalizations skyrocket, with revenue pouring in, even after initially starting as non-profit research organizations. The shift towards paid subscriptions has contributed to their substantial financial gains.

Seeking resolution through dialogue

Rather than resorting to litigation, the NMA has expressed its intent to pursue dialogue as a means of resolving these disputes. They acknowledge that generative AI has several potential benefits for journalism. The NMA members have expressed their readiness to discuss reasonable licensing solutions to facilitate reliable and updated access to trustworthy expressive content. They believe that such an approach would benefit all parties involved and society as a whole.

In their statement, the NMA stated, “Notably, NMA members stand ready to come to the table and discuss reasonable licensing solutions to facilitate reliable, updated access to trustworthy expressive content, something that will benefit all interested parties and society at large, rather than engage in litigation to protect their rights.”

AI firms have faced legal challenges from copyright holders who have taken them to court over alleged copyright violations. Companies like Meta, Anthropic AI, and OpenAI have been involved in class-action lawsuits, often invoking fair use as a defense against these legal actions.

Blockchain and AI convergence for improved data collection

Amid these growing concerns surrounding AI and intellectual property, experts have suggested that the convergence of blockchain technology and AI could potentially enhance the state of data collection by AI firms. This theoretical approach posits that blockchain could be used to identify AI-generated content while providing traceability for training data used in LLMs.

The intersection of blockchain and AI holds the promise of creating a transparent and accountable ecosystem where the origins of data and content can be verified, potentially mitigating issues related to copyright infringement and data scraping.

The News Media Alliance’s allegations against AI chatbots for illegal data scraping and copyright violations underscore the growing tension between AI technology and traditional media outlets. As AI developers continue to profit from AI-generated content, news organizations are concerned about the financial implications of this competition.

The NMA’s willingness to seek resolution through dialogue and explore licensing solutions suggests a desire to find common ground and balance the interests of all parties involved. As the debate over the ethics and legality of AI data usage continues, the convergence of blockchain and AI emerges as a potential solution to address these challenges and ensure a fair and transparent data collection ecosystem. The outcome of these discussions will likely have significant implications for the future of AI and journalism.

Source: https://www.cryptopolitan.com/ai-chatbots-accused-of-illegal-data-scraping-harming-news-outlets/