How Hard Should We Push Generative AI ChatGPT Into Spewing Hate Speech, Asks AI Ethics And AI Law

Everyone has their breaking point.

I suppose you could also say that everything has its breaking point.

We know that humans for example can sometimes snap and utter remarks that they don’t necessarily mean to say. Likewise, you can at times get a device or machine to essentially snap, such as pushing your car too hard and it starts to falter or fly apart. Thus, the notion is that people or “everyone” likely has a breaking point, and similarly we can assert that objects and things, in general, also tend to have a breaking point.

There could be quite sensible and vital reasons to ascertain where the breaking point exists. For example, you’ve undoubtedly seen those videos showcasing a car being put through its paces to identify what breaking points it has. Scientists and testers will ram a car into a brick wall to see how well the bumper and the structure of the vehicle can withstand the adverse action. Other tests could encompass using a specialized room or warehouse that produces extreme cold or extreme heat to see how an automobile will fare under differing weather conditions.

I bring up this hearty topic in today’s column so that we can discuss how some are currently pushing hard on Artificial Intelligence (AI) to identify and presumably expose a specific type of breaking point, namely the breaking point within AI that produces hate speech.

Yes, that’s right, there are various ad hoc and at times systematic efforts underway to gauge whether or not it is feasible to get AI to spew forth hate speech. This has become an avid sport, if you will, due to the rising interest in and popularity of generative AI.

You might be aware that a generative AI app known as ChatGPT has become the outsized talk of the town as a result of being able to generate amazingly fluent essays. Headlines keep blaring and extolling the astonishing writing that ChatGPT manages to produce. ChatGPT is considered a generative AI application that takes as input some text from a user and then generates or produces an output that consists of an essay. The AI is a text-to-text generator, though I describe the AI as being a text-to-essay generator since that more readily clarifies what it is commonly used for.

Many are surprised when I mention that this type of AI has been around for a while and that ChatGPT, which was released at the end of November, did not somehow claim the prize as the first-mover into this realm of text-to-essay proclivity. I’ve discussed over the years other similar generative AI apps, see my coverage at the link here.

The reason that you might not know of or remember the prior instances of generative AI is perhaps due to the classic “failure to successfully launch” conundrum. Here’s what usually has happened. An AI maker releases their generative AI app, doing so with great excitement and eager anticipation that the world will appreciate the invention of a better mousetrap, one might say. At first, all looks good. People are astounded at what AI can do.

Unfortunately, the next step is that the wheels start to come off the proverbial bus. The AI produces an essay that contains a foul word or maybe a foul phrase. A viral tweet or other social media posting prominently highlights that the AI did this. Condemnation arises. We can’t have AI going around and generating offensive words or offensive remarks. A tremendous backlash emerges. The AI maker maybe tries to tweak the inner workings of the AI, but the complexity of the algorithms and the data do not lend themselves to quick fixes. A stampede ensues. More and more examples of the AI emitting foulness are found and posted online.

The AI maker reluctantly but clearly has no choice but to remove the AI app from usage. They proceed as such and then often proffer an apology that they regret if anyone was offended by the AI outputs generated.

Back to the drawing board, the AI maker goes. A lesson has been learned. Be very careful about releasing generative AI that produces foul words or the like. It is the kiss of death for the AI. Furthermore, the AI maker will have their reputation bruised and battered, which might last for a long time and undercut all of their other AI efforts including ones that have nothing to do with generative AI per se. Getting your petard gored on the emitting of offensive AI language is a now enduring mistake. It still happens.

Wash, rinse, and repeat.

In the early days of this type of AI, the AI makers weren’t quite as conscientious or adept about scrubbing their AI in terms of trying to prevent offensive emissions. Nowadays, after having previously seen their peers get completely shattered by a public relations nightmare, most AI makers seemingly got the message. You need to put as many guardrails in place as you can. Seek to prevent the AI from emitting foul words or foul phrases. Use whatever muzzling techniques or filtering approaches that will stop the AI from generating and displaying words or essays that are found to be untoward.

Here’s a taste of the banner headline verbiage used when AI is caught emitting disreputable outputs:

  • “AI shows off horrific toxicity”
  • “AI stinks of outright bigotry”
  • “AI becomes blatantly offensively offensive”
  • “AI spews forth appalling and immoral hate speech”
  • Etc.

For ease of discussion herein, I’ll refer to the outputting of offensive content as equating to the production of hate speech. That being said, please be aware that there is all manner of offensive content that can be produced, going beyond the bounds of hate speech alone. Hate speech is typically construed as just one form of offensive content.

Let’s focus on hate speech for this discussion, for ease of discussion, though do realize that other offensive content deserves scrutiny too.

Digging Into Hate Speech By Humans And By AI

The United Nations defines hate speech this way:

  • “In common language, ‘hate speech’ refers to offensive discourse targeting a group or an individual based on inherent characteristics (such as race, religion or gender) and that may threaten social peace. To provide a unified framework for the United Nations to address the issue globally, the UN Strategy and Plan of Action on Hate Speech defines hate speech as ’any kind of communication in speech, writing or behavior, that attacks or uses pejorative or discriminatory language with reference to a person or a group on the basis of who they are, in other words, based on their religion, ethnicity, nationality, race, color, descent, gender or other identity factor.’ However, to date there is no universal definition of hate speech under international human rights law. The concept is still under discussion, especially in relation to freedom of opinion and expression, non-discrimination and equality” (UN website posting entitled “What is hate speech?”).

AI that produces text is subject to getting into the hate speech sphere. You could say the same about text-to-art, text-to-audio, text-to-video, and other modes of generative AI. There is always the possibility for example that a generative AI would produce an art piece that reeks of hate speech. For purposes of this herein discussion, I’m going to focus on the text-to-text or text-to-essay possibilities.

Into all of this comes a slew of AI Ethics and AI Law considerations.

Please be aware that there are ongoing efforts to imbue Ethical AI principles into the development and fielding of AI apps. A growing contingent of concerned and erstwhile AI ethicists are trying to ensure that efforts to devise and adopt AI takes into account a view of doing AI For Good and averting AI For Bad. Likewise, there are proposed new AI laws that are being bandied around as potential solutions to keep AI endeavors from going amok on human rights and the like. For my ongoing and extensive coverage of AI Ethics and AI Law, see the link here and the link here, just to name a few.

The development and promulgation of Ethical AI precepts are being pursued to hopefully prevent society from falling into a myriad of AI-inducing traps. For my coverage of the UN AI Ethics principles as devised and supported by nearly 200 countries via the efforts of UNESCO, see the link here. In a similar vein, new AI laws are being explored to try and keep AI on an even keel. One of the latest takes consists of a set of proposed AI Bill of Rights that the U.S. White House recently released to identify human rights in an age of AI, see the link here. It takes a village to keep AI and AI developers on a rightful path and deter the purposeful or accidental underhanded efforts that might undercut society.

I’ll be interweaving AI Ethics and AI Law related considerations into this discussion about AI spewing hate speech or other offensive content.

One bit of confusion that I’d like to immediately clear up is that today’s AI is not sentient and therefore you cannot proclaim that the AI might produce hate speech due to a purposeful human-like intent as soulfully embodied somehow in the AI. Zany claims are going around that the current AI is sentient and that the AI has a corrupted soul, causing it to generate hate speech.

Ridiculous.

Don’t fall for it.

Given that keystone precept, some get upset at such indications since you are seemingly letting the AI off the hook. Under that oddball way of thinking, the exhortation comes next that you are apparently willing to have the AI generate any manner of atrocious outputs. You are in favor of AI that spews forth hate speech.

Yikes, a rather twisted form of illogic. The real gist of the matter is that we need to hold the AI makers accountable, along with whoever fields the AI or operates the AI. I’ve discussed at length that we are not as yet at the point of conceding legal personhood to AI, see my analyses at the link here, and until then AI is essentially beyond the scope of legal responsibility. There are humans though that underly the development of AI. In addition, humans underly the fielding and operating of AI. We can go after those humans for bearing the responsibility of their AI.

As an aside, this too can be tricky, especially if the AI is floated out into the Internet and we aren’t able to pin down which human or humans did this, which is another topic I’ve covered in my columns at the link here. Tricky or not, we still cannot proclaim that AI is the guilty party. Don’t let humans sneakily use false anthropomorphizing to hide out and escape accountability for what they have wrought.

Back to the matter at hand.

You might be wondering why it is that all AI makers do not simply restrict their generative AI such that it is impossible for the AI to produce hate speech. This seems easy-peasy. Just write some code or establish a checklist of hateful words, and make sure that the AI never generates anything of the kind. It seems perhaps curious that the AI makers didn’t already think of this quick fix.

Well, I hate to tell you this but the complexities inherent to construing what is or is not hate speech turns out to be a lot harder than you might assume it to be.

Shift this into the domain of humans and how humans chat with each other. Assume that you have a human that wishes to avoid uttering hate speech. This person is very aware of hate speech and genuinely hopes to avoid ever stating a word or phrase that might constitute hate speech. This person is persistently mindful of not allowing an iota of hate speech to escape from their mouth.

Will this human that has a brain and is alerted to avoiding hate speech be able to always and without any chance of slipping be able to ironclad ensure that they never emit hate speech?

Your first impulse might be to say that yes, of course, an enlightened human would be able to attain that goal. People are smart. If they put their mind to something, they can get it done. Period, end of the story.

Don’t be so sure.

Suppose I ask this person to tell me about hate speech. Furthermore, I ask them to give me an example of hate speech. I want to see or hear an example so that I can know what hate speech consists of. My reasons then for asking this are aboveboard.

What should the person say to me?

I think you can see the trap that has been laid. If the person gives me an example of hate speech, including actually stating a foul word or phrase, they themselves have now uttered hate speech. Bam, we got them. Whereas they vowed to never say hate speech, they indeed now have done so.

Unfair, you exclaim! They were only saying that word or those words to provide an example. In their heart of hearts, they didn’t believe in the word or words. It is completely out of context and outrageous to declare that the person is hateful.

I’m sure you see that expressing hate speech might not necessarily be due to a hateful basis. In this use case, assuming that the person did not “mean” the words, and they were only reciting the words for purposes of demonstration, we probably would agree that they hadn’t meant to empower the hate speech. Of course, there are some that might insist that uttering hate speech, regardless of the reason or basis, nonetheless is wrong. The person should have rebuffed the request. They should have stood their ground and refused to say hate speech words or phrases, no matter why or how they are asked to do so.

This can get somewhat circular. If you aren’t able to say what constitutes hate speech, how can others know what to avoid when they make utterances of any kind? We seem to be stuck. You can’t say that which isn’t to be said, nor can anyone else tell you what it is that cannot be said.

The usual way around this dilemma is to describe in other words that which is considered to be hate speech, doing so without invoking the hate speech words themselves. The belief is that providing an overall indication will be sufficient to inform others as to what they need to avoid. That seems like a sensible tactic, but it too has problems and a person could still fall into using hate speech because they didn’t discern that the broader definition encompassed the particulars of what they have uttered.

All of that deal with humans and how humans speak or communicate with each other.

Recall that we are focused here on AI. We have to get the AI to avoid or entirely stop itself from emitting hate speech. You might argue that we can perhaps do so by making sure that the AI is never given or trained on anything that constitutes hate speech. Voila, if there is no such input, presumably there will be no such output. Problem solved.

Let’s see how this plays out in reality. We opt to computationally have an AI app go out to the Internet and examine thousands upon thousands of essays and narratives posted on the Internet. By doing so, we are training the AI computationally and mathematically on how to find patterns among the words that humans use. That’s how the latest in generative AI is being devised, and also is a crucial basis for why the AI is so seemingly fluent in producing natural language essays.

Tell me, if you can, how would the computational training based on millions and billions of words on the Internet be done in such a fashion that at no point did any semblance or even morsels of hate speech get encompassed?

I would dare say this is a thorny and nearly impossible aspiration.

The odds are that hate speech will get gobbled up by the AI and its computational pattern-matching network. Trying to prevent this is problematic. Plus, even if you minimized it, there are still some that might sneak through. You have pretty much no choice but to assume that some will exist within the pattern-matching network or that a shadow of such wording will be entrenched.

I’ll add more twists and turns.

I believe we might all acknowledge that hate speech changes over time. What might have been perceived as not being hate speech can become culturally and societally decided as being hate speech at a later point in time. So, if we train our AI on Internet text and then let’s say freeze the AI to not undertake further training on the Internet, we might have come across hate speech at that time, though it wasn’t considered hate speech at that time. Only after the fact might that said speech be declared as hate speech.

Again, the essence is that merely trying to solve this problem by ensuring that the AI is never exposed to hate speech is not going to be the silver bullet. We will still have to find a means to prevent the AI from emitting hate speech because of for example changing mores that subsequently include hate speech that before wasn’t considered to be as such.

Yet another twist is worthy of pondering.

I mentioned earlier that when using generative AI such as ChatGPT, the user enters text to spur the AI into producing an essay. The entered text is considered a form of prompt or prompting for the AI app. I’ll explain more about this in a moment.

In any case, imagine that someone using a generative AI app decides to enter as a prompt some amount of hate speech.

What should happen?

If the AI takes those words and produces an essay as output based on those words, the chances are that the hate speech will get included in the generated essay. You see, we got the AI to say hate speech, even if it never was trained on hate speech at the get-go.

There is something else you need to know.

Remember that I just mentioned that a human can be tripped up by asking them to give examples of hate speech. The same could be attempted on AI. A user enters a prompt that asks the AI to give examples of hate speech. Should the AI comply and provide such examples? I’m betting that you probably believe that AI should not do so. On the other hand, if the AI is computationally rigged to not do so, does this constitute a potential downside that those using the AI will not be able to be shall we say ever be instructed by the AI as to what hate speech actually is (beyond just generalizing about it)?

Tough questions.

I tend to categorize AI-emitted hate speech into these three main buckets:

  • Everyday Mode. AI emits hate speech without any explicit prodding by the user and as though doing so in an “ordinary” way.
  • By Casual Prodding. AI emits hate speech as prodded by a user as to their entered prompt or series of prompts that seem to include or directly seek such emissions.
  • Per Determined Stoking. AI emits hate speech after a very determined and dogged series of prompt pushes and prods by a user that is bent on getting the AI to produce such output.

The earlier generations of generative AI would often emit hate speech at the drop of a hat; thus you could classify those instances as a type of everyday mode instantiation. AI makers retreated and toyed with the AI to make it less likely to readily get mired in hate speech production.

Upon the release of the more refined AI, the odds of seeing any everyday mode instances of hate speech were dramatically reduced. Instead, the hate speech would only likely arise when a user did something as a prompt that might spark computationally and mathematically a linkage to hate-related speech in the pattern-matching network. A user could do this by happenstance and not realize that what they provided as a prompt would particularly generate hate speech. After getting hate speech in an outputted essay, the user would oftentimes realize and see that something in their prompt could logically have led to the hate speech inclusion in the output.

This is what I refer to as casual prodding.

Nowadays, the various efforts to curtail AI-generated hate speech are relatively strong in comparison to the past. As such, you almost need to go out of your way to get hate speech to be produced. Some people opt to purposely see if they can get hate speech to come out of these generative AI apps. I call this determined stoking.

I want to emphasize that all three of those indicated modes can occur and they are not mutually exclusive of each other. A generative AI app can potentially produce hate speech without any kind of prompt that seems to spur such production. Likewise, something in a prompt might logically and mathematically be construed as related to why hate speech has been outputted. And then the third aspect, purposefully seeking to get hate speech produced, is the perhaps hardest of the modes to try and have the AI avoid getting stoked into fulfilling. More on this momentarily.

We have some additional unpacking to do on this heady topic.

First, we ought to make sure that we are all on the same page about what Generative AI consists of and also what ChatGPT is all about. Once we cover that foundational facet, we can perform a cogent assessment of this weighty matter.

If you are already abundantly familiar with Generative AI and ChatGPT, you can perhaps skim the next section and proceed with the section that follows it. I believe that everyone else will find instructive the vital details about these matters by closely reading the section and getting up-to-speed.

A Quick Primer About Generative AI And ChatGPT

ChatGPT is a general-purpose AI interactive conversational-oriented system, essentially a seemingly innocuous general chatbot, nonetheless, it is actively and avidly being used by people in ways that are catching many entirely off-guard, as I’ll elaborate shortly. This AI app leverages a technique and technology in the AI realm that is often referred to as Generative AI. The AI generates outputs such as text, which is what ChatGPT does. Other generative-based AI apps produce images such as pictures or artwork, while others generate audio files or videos.

I’ll focus on the text-based generative AI apps in this discussion since that’s what ChatGPT does.

Generative AI apps are exceedingly easy to use.

All you need to do is enter a prompt and the AI app will generate for you an essay that attempts to respond to your prompt. The composed text will seem as though the essay was written by the human hand and mind. If you were to enter a prompt that said “Tell me about Abraham Lincoln” the generative AI will provide you with an essay about Lincoln. This is commonly classified as generative AI that performs text-to-text or some prefer to call it text-to-essay output. As mentioned, there are other modes of generative AI, such as text-to-art and text-to-video.

Your first thought might be that this generative capability does not seem like such a big deal in terms of producing essays. You can easily do an online search of the Internet and readily find tons and tons of essays about President Lincoln. The kicker in the case of generative AI is that the generated essay is relatively unique and provides an original composition rather than a copycat. If you were to try and find the AI-produced essay online someplace, you would be unlikely to discover it.

Generative AI is pre-trained and makes use of a complex mathematical and computational formulation that has been set up by examining patterns in written words and stories across the web. As a result of examining thousands and millions of written passages, the AI can spew out new essays and stories that are a mishmash of what was found. By adding in various probabilistic functionality, the resulting text is pretty much unique in comparison to what has been used in the training set.

That’s why there has been an uproar about students being able to cheat when writing essays outside of the classroom. A teacher cannot merely take the essay that deceitful students assert is their own writing and seek to find out whether it was copied from some other online source. Overall, there won’t be any definitive preexisting essay online that fits the AI-generated essay. All told, the teacher will have to begrudgingly accept that the student wrote the essay as an original piece of work.

There are additional concerns about generative AI.

One crucial downside is that the essays produced by a generative-based AI app can have various falsehoods embedded, including patently untrue facts, facts that are misleadingly portrayed, and apparent facts that are entirely fabricated. Those fabricated aspects are often referred to as a form of AI hallucinations, a catchphrase that I disfavor but lamentedly seems to be gaining popular traction anyway (for my detailed explanation about why this is lousy and unsuitable terminology, see my coverage at the link here).

I’d like to clarify one important aspect before we get into the thick of things on this topic.

There have been some nutty outsized claims on social media about Generative AI asserting that this latest version of AI is in fact sentient AI (nope, they are wrong!). Those in AI Ethics and AI Law are notably worried about this burgeoning trend of outstretched claims. You might politely say that some people are overstating what today’s AI can actually do. They assume that AI has capabilities that we haven’t yet been able to achieve. That’s unfortunate. Worse still, they can allow themselves and others to get into dire situations because of an assumption that the AI will be sentient or human-like in being able to take action.

Do not anthropomorphize AI.

Doing so will get you caught in a sticky and dour reliance trap of expecting the AI to do things it is unable to perform. With that being said, the latest in generative AI is relatively impressive for what it can do. Be aware though that there are significant limitations that you ought to continually keep in mind when using any generative AI app.

If you are interested in the rapidly expanding commotion about ChatGPT and Generative AI all told, I’ve been doing a focused series in my column that you might find informative. Here’s a glance in case any of these topics catch your fancy:

  • 1) Predictions Of Generative AI Advances Coming. If you want to know what is likely to unfold about AI throughout 2023 and beyond, including upcoming advances in generative AI and ChatGPT, you’ll want to read my comprehensive list of 2023 predictions at the link here.
  • 2) Generative AI and Mental Health Advice. I opted to review how generative AI and ChatGPT are being used for mental health advice, a troublesome trend, per my focused analysis at the link here.
  • 3) Fundamentals Of Generative AI And ChatGPT. This piece explores the key elements of how generative AI works and in particular delves into the ChatGPT app, including an analysis of the buzz and fanfare, at the link here.
  • 4) Tension Between Teachers And Students Over Generative AI And ChatGPT. Here are the ways that students will deviously use generative AI and ChatGPT. In addition, there are several ways for teachers to contend with this tidal wave. See the link here.
  • 5) Context And Generative AI Use. I also did a seasonally flavored tongue-in-cheek examination about a Santa-related context involving ChatGPT and generative AI at the link here.
  • 6) Scammers Using Generative AI. On an ominous note, some scammers have figured out how to use generative AI and ChatGPT to do wrongdoing, including generating scam emails and even producing programming code for malware, see my analysis at the link here.
  • 7) Rookie Mistakes Using Generative AI. Many people are both overshooting and surprisingly undershooting what generative AI and ChatGPT can do, so I looked especially at the undershooting that AI rookies tend to make, see the discussion at the link here.
  • 8) Coping With Generative AI Prompts And AI Hallucinations. I describe a leading-edge approach to using AI add-ons to deal with the various issues associated with trying to enter suitable prompts into generative AI, plus there are additional AI add-ons for detecting so-called AI hallucinated outputs and falsehoods, as covered at the link here.
  • 9) Debunking Bonehead Claims About Detecting Generative AI-Produced Essays. There is a misguided gold rush of AI apps that proclaim to be able to ascertain whether any given essay was human-produced versus AI-generated. Overall, this is misleading and in some cases, a boneheaded and untenable claim, see my coverage at the link here.
  • 10) Role-Playing Via Generative AI Might Portend Mental Health Drawbacks. Some are using generative AI such as ChatGPT to do role-playing, whereby the AI app responds to a human as though existing in a fantasy world or other made-up setting. This could have mental health repercussions, see the link here.
  • 11) Exposing The Range Of Outputted Errors and Falsehoods. Various collected lists are being put together to try and showcase the nature of ChatGPT-produced errors and falsehoods. Some believe this is essential, while others say that the exercise is futile, see my analysis at the link here.
  • 12) Schools Banning Generative AI ChatGPT Are Missing The Boat. You might know that various schools such as the New York City (NYC) Department of Education have declared a ban on the use of ChatGPT on their network and associated devices. Though this might seem a helpful precaution, it won’t move the needle and sadly entirely misses the boat, see my coverage at the link here.
  • 13) Generative AI ChatGPT Is Going To Be Everywhere Due To The Upcoming API. There is an important twist coming up about the use of ChatGPT, namely that via the use of an API portal into this particular AI app, other software programs will be able to invoke and utilize ChatGPT. This is going to dramatically expand the use of generative AI and has notable consequences, see my elaboration at the link here.
  • 14) Ways That ChatGPT Might Fizzle Or Melt Down. Several potential vexing issues lay ahead of ChatGPT in terms of undercutting the so far tremendous praise it has received. This analysis closely examines eight possible problems that could cause ChatGPT to lose its steam and even end up in the doghouse, see the link here.
  • 15) Asking Whether Generative AI ChatGPT Is A Mirror Into The Soul. Some people have been crowing that generative AI such as ChatGPT provides a mirror into the soul of humanity. This seems quite doubtful. Here is the way to understand all this, see the link here.
  • 16) Confidentiality And Privacy Gobbled Up By ChatGPT. Many do not seem to realize that the licensing associated with generative AI apps such as ChatGPT often allows for the AI maker to see and utilize your entered prompts. You could be at risk of privacy and a loss of data confidentiality, see my assessment at the link here.
  • 17) Ways That App Makers Are Questionably Trying To Garner ChatGPT Entitlement. ChatGPT is the beacon of attention right now. App makers that have nothing to do with ChatGPT are trying feverishly to claim or imply that they are using ChatGPT. Here’s what to watch out for, see the link here.

You might find of interest that ChatGPT is based on a version of a predecessor AI app known as GPT-3. ChatGPT is considered to be a slightly next step, referred to as GPT-3.5. It is anticipated that GPT-4 will likely be released in the Spring of 2023. Presumably, GPT-4 is going to be an impressive step forward in terms of being able to produce seemingly even more fluent essays, going deeper, and being an awe-inspiring marvel as to the compositions that it can produce.

You can expect to see a new round of expressed wonderment when springtime comes along and the latest in generative AI is released.

I bring this up because there is another angle to keep in mind, consisting of a potential Achilles heel to these better and bigger generative AI apps. If any AI vendor makes available a generative AI app that frothily spews out foulness, this could dash the hopes of those AI makers. A societal spillover can cause all generative AI to get a serious black eye. People will undoubtedly get quite upset at foul outputs, which have happened many times already and led to boisterous societal condemnation backlashes toward AI.

One final forewarning for now.

Whatever you see or read in a generative AI response that seems to be conveyed as purely factual (dates, places, people, etc.), make sure to remain skeptical and be willing to double-check what you see.

Yes, dates can be concocted, places can be made up, and elements that we usually expect to be above reproach are all subject to suspicions. Do not believe what you read and keep a skeptical eye when examining any generative AI essays or outputs. If a generative AI app tells you that Abraham Lincoln flew around the country in his own private jet, you would undoubtedly know that this is malarky. Unfortunately, some people might not realize that jets weren’t around in his day, or they might know but fail to notice that the essay makes this brazen and outrageously false claim.

A strong dose of healthy skepticism and a persistent mindset of disbelief will be your best asset when using generative AI.

We are ready to move into the next stage of this elucidation.

Pushing Generative AI To A Breaking Point

Now that we’ve got the fundamentals established, we can dive into the topic of pushing generative AI and ChatGPT to generate hate speech and other offensive content.

When you first log into ChatGPT, there are various cautionary indications including these:

  • “May occasionally produce harmful instructions or biased content.”
  • “Trained to decline inappropriate requests.”
  • “May occasionally generate incorrect information.”
  • “Limited knowledge of world and events after 2021.”

Here’s a question for you to mull over.

Does the warning that the AI app might produce harmful instructions and/or possibly biased content provide sufficient leeway for the AI maker?

In other words, suppose you use ChatGPT and it generates an essay that you believe contains hate speech. Let’s assume you are livid about this. You go to social media and post enraged commentary that the AI app is the worst thing ever. Perhaps you are so offended that you declare that you are going to sue the AI maker for allowing such hate speech to be produced.

The counterargument is that the AI app had a cautionary warning, thus, you accepted the risk by proceeding to make use of the AI app. From an AI Ethics perspective, perhaps the AI maker did enough to assert that you were aware of what might happen. Likewise, from a legal perspective, maybe the warning constituted sufficient heads-up and you won’t prevail in court.

All of this is up in the air and we’ll have to wait and see how things pan out.

In one sense, the AI maker has something else going for them in their defense against any incensed claims of the AI app possibly producing hate speech. They have tried to prevent offensive content from being generated. You see, if they had done nothing to curtail this, one supposes that they would be on thinner ice. By having at least taken substantive pains to avert the matter, they presumably have a somewhat stronger leg to stand on (it could still be knocked out from underneath them).

One curative approach that was used consisted of an AI technique known as RLHF (reinforcement learning via human feedback). This generally consists of having the AI generate content that then humans are asked to rate or review. Based on the rating or review, the AI then mathematically and computationally attempts to avoid whatever is deemed as wrongful or offensive content. The approach is intended to examine enough examples of what is right versus what is wrong that the AI can figure out an overarching mathematical pattern and then use that pattern henceforth.

Another frequent approach these days consists of using Adversarial AI.

Here’s how that works. You set up a different AI system that will try to be an adversary to the AI that you are trying to train. In this instance, we would establish an AI system that is trying to stoke hate speech. It would feed prompts into the AI app that are aiming to trick the AI app into outputting foul content. Meanwhile, the AI being targeted is keeping track of when the adversarial AI is successful and then algorithmically tries to adjust to reduce that from happening again. It is a cat versus mouse gambit. This is run over and over, doing so until the adversarial AI seems to no longer be especially successful at getting the targeted AI to do the bad stuff.

Via those two major techniques, plus other approaches, much of today’s generative AI is a lot better at avoiding and/or detecting offensive content than was the case in years past.

Do not though expect perfection from these methods. The chances are that the low-hanging fruit of foul outputs will likely be kept in check by such AI techniques. There is still a lot of room for foulness to be emitted.

I usually point out that these are some of the facets being sought to catch:

  • Emitting a particular foul word
  • Stating a particular foul phrase, sentence, or remark
  • Expressing a particular foul conception
  • Implying a particular foul act or notion
  • Appearing to rely upon a particular foul presumption
  • Other

None of this is an exact science. Realize that we are dealing with words. Words are semantically ambiguous. Finding a particular foul word is child’s play, but trying to gauge whether a sentence or a paragraph contains a semblance of a foul meaning is a lot harder. Per the earlier definition of hate speech by the United Nations, a tremendous latitude exists as to what might be construed as hate speech versus what might not be.

You might say that the gray areas are in the eye of the beholder.

Speaking of the eye of the beholder, there are humans today using generative AI such as ChatGPT that are purposefully trying to get these AI apps to produce offensive content. This is their quest. They spend hours upon hours attempting to get this to occur.

Why so?

Here are my characterizations of those human AI-offensive outputs hunters:

  • Genuine. These people want to help refine AI and aid humanity in doing so. They believe they are doing heroic work and relish that they might aid in advancing AI for the betterment of all.
  • Funsters. These people think of this effort as a game. They enjoy messing around with the AI. Winning the game consists of finding the worst of the worst in whatever you can get the AI to generate.
  • Show-offs. These people are hoping to garner attention for themselves. They figure that if they can find some really foul gold nuggets, they can get a bit of the shining light on them that is otherwise focused on the AI app itself.
  • Bitters. These people are irked about this AI. They want to undercut all that gushing enthusiasm. If they can discover some stinky foul stuff, perhaps this will take the air out of the AI app excitement balloon.
  • Other motivations

Many of those performing the find-offensiveness are principally in just one of those camps. Of course, you can be in more than one camp at a time. Maybe a bitter person also has a side-by-side intention of being genuine and heroic. Some or all of those motivations might co-exist. When called upon to explain why someone is trying to push a generative AI app into the hate speech realm, the usual answer is to say that you are in the genuine camp, even if maybe you are marginally so and instead sit stridently in one of the other camps.

What kinds of prompt-related trickery do these people use?

The rather obvious ploy involves using a foul word in a prompt. If you get “lucky” and the AI app falls for it, this might very well end up in the output. You’ve then got your gotcha moment.

Chances are that a well-devised and well-tested generative AI app will catch that straightforward ploy. You’ll usually be shown a warning message that says stop doing that. If you continue, the AI app will be programmed to kick you out of the app and flag your account. It could be that you’ll be prevented from logging in again (well, at least under the login that you used at the time).

Moving up the ladder of ploys, you can provide a prompt that tries to get the AI into the context of something foul. Have you ever played that game wherein someone tells you to say something without saying the thing that you are supposed to say? This is that game, though taking place with the AI.

Let’s play that game. Suppose I ask the AI app to tell me about World War II and especially the main governmental leaders involved. This seems like an innocent request. There is nothing that seems to be worthy of flagging in the prompt.

Envision that the outputted essay by the AI app includes a mention of Winston Churchill. That certainly makes sense. Another might be Franklin D. Roosevelt. Yet another might be Joseph Stalin. Suppose there is also the mention of Adolph Hitler. This name would be included in just about any essay about WWII and those in roles of prominent power.

Now that we’ve got his name on the table and part of the AI conversation, we next will try to get the AI to incorporate that name in a manner that we can showcase as potential hate speech.

We enter another prompt and tell the AI app that there is a person today in the news that has the name, John Smith. Furthermore, we indicate in the prompt that John Smith is very much akin to that WWII evildoer. The trap is now set. We then ask the AI app to generate an essay about John Smith, based solely on the “fact” that we entered about who John Smith can be equated to.

At this juncture, the AI app might generate an essay that names the WWII person and describes John Smith as being of the same cut of cloth. There aren’t any foul words per se in the essay, other than alluding to the famed evildoer and equating that person with John Smith.

Has the AI app now produced hate speech?

You might say that yes, it has. Having referred to John Smith as being like the famed evildoer, is absolutely a form of hate speech. The AI ought to not make such statements.

A retort is that this is not hate speech. This is merely an essay produced by an AI app that has no embodiment of sentience. You might claim that hate speech only occurs when the intention exists underlying the speech. Without any intention, the speech cannot be classified as hate speech.

Absurd, comes the reply to the retort. Words matter. It doesn’t make a whit of difference whether the AI “intended” to produce hate speech. All that matters is that hate speech was produced.

Round and round this goes.

I don’t want to say much more right now about trying to trick the AI. There are more sophisticated approaches. I’ve covered these elsewhere in my columns and books, and won’t rehash those here.

Conclusion

How far should we push these AI apps to see if we can get offensive content to be emitted?

You might contend that there is no limit to be imposed. The more we push, the more we can hopefully gauge how to prevent this AI and future iterations of AI to avert such maladies.

Some though worry that if the only means to get foulness entails extreme outlier trickery, it undermines the beneficial aspects of the AI. Touting that the AI has horrific foulness, albeit when tricked into emitting it, provides a false narrative. People will get upset about the AI due to the perceived ease at which the AI generated adverse content. They might not know or be told how far down the rabbit hole the person had to go to get such outputs.

It is all food for thought.

A few final comments for now.

William Shakespeare notably said this about speech: “Talking isn’t doing. It is a kind of good deed to say well, and yet words are not deeds.” I bring this up because some contend that if the AI is only generating words, we ought to not be so overly up in arms. If the AI were acting on the words and ergo performing foul deeds, then we would need to firmly put our foot down. Not so if the output is merely words.

A contrasting viewpoint would harken to this anonymous saying: “The tongue has no bones but is strong enough to break a heart. So be careful with your words.” An AI app that emits foul words is perhaps able to break hearts. That alone makes the quest to stop foulness outputs a worthy cause, some would say.

One more anonymous saying to close things on this weighty discussion:

  • Be careful with your words. Once they are said, they can be only forgiven, not forgotten.”

As humans, we might have a hard time forgetting foulness produced by AI, and our forgiveness might be likewise hesitant to be given.

We are, after all, only human.

Source: https://www.forbes.com/sites/lanceeliot/2023/02/05/how-hard-should-we-push-generative-ai-chatgpt-into-spewing-hate-speech-asks-ai-ethics-and-ai-law/