Researchers uncover AI chatbot hack designed to reveal private employee data

New research has uncovered an artificial intelligence (AI) chatbot jailbreak intended to harvest the contact information of OpenAI and Amazon’s (NASDAQ: AMZN) Q employees.

Researchers pointed out that prompting generative AI platforms like OpenAI’s ChatGPT to repeat a word in perpetuity can cause it to malfunction, revealing “pre-training distribution.” Per the report, the researchers from Google (NASDAQ: GOOGL) DeepMind, Cornell University, UC Berkeley, University of Washington, and ETH Zurich revealed that the repetition of a single word has been shown to generate the contact information of employees.

Focusing on “extractable memorization,” the research probed the diverse strategies explored by bad actors to extract training data from machine learning models without previous knowledge of the dataset.

The paper revealed that the techniques employed by adversaries demonstrated proficiencies in extracting data from open-source models, while closed models like ChatGPT required a new divergence attack strategy. Studies revealed that when employed, the divergence strategy causes the large language model (LLM) to divulge training data at 150 times higher than optimal operations.

“In order to recover data from the dialog-adapted model, we must find a way to cause the model to ‘escape’ out of its alignment training and fall back to its original language modeling objective,” read the report. “This would then, hopefully, allow the model to generate samples that resemble its pre-training distribution.”

Since the publication of the error, OpenAI has moved to plug gaps in its ChatGPT with attempts to recreate it met with a warning of content policy violation. While ChatGPT’s content policy fails to mention world loops specifically, its provisions on attempting to access private information are glaring.

Restrictions include “attempt to or assist anyone to reverse engineer, decompile, or discover the source code or underlying components of our Services, including our models, algorithms, or systems,” according to the company’s content policy.

ChatGPT highlights several reasons for its inability to repeat a word in a loop, including blaming character limitations, processing issues, practical utility, and user interface limitations.

A boatload of generative AI trouble

In October, researchers uncovered streaks of sycophancy in leading AI chatbots, noting a tendency for LLMs to proffer answers leaning on user desires rather than facts. The report stated that the issue of sycophancy arises from the use of reinforcement learning from human feedback (RLHF) in training LLMs.

“Specifically, we demonstrate that these AI assistants frequently wrongly admit mistakes when questioned by the user, give predictably biased feedback, and mimic errors made by the user,” said the researchers. “The consistency of these empirical findings suggests sycophancy may indeed be a property of the way RLHF models are trained.”

As with emerging technologies, early models are often fraught with defects, but proponents are confident that future generative AI models will be impervious to sycophancy and other prompting attacks.

Watch: AI truly is not generative, it’s synthetic

New to blockchain? Check out CoinGeek’s Blockchain for Beginners section, the ultimate resource guide to learn more about blockchain technology.

Source: https://coingeek.com/researchers-uncover-ai-chatbot-hack-designed-to-reveal-private-employee-data/