Does Exposing And Logging Those Generative AI ChatGPT Erroneous And Hallucinatory Outputs Do Any Good, Asks AI Ethics And AI Law

You are in for a secret.

Yes, there’s something surreptitiously going on behind the scenes about the latest in AI that you might want to know about.

In today’s column, I’ll be examining the recent impetus to compile alleged errors and other maladies associated with the outputted essays from an AI app known as ChatGPT. This particular app is utilizing a type of AI called Generative AI. You’ve probably heard or seen eye-catching headlines about generative-based AI and ChatGPT. The news is agog over this specific AI app and the seemingly astonishing capabilities pertaining to this type of AI.

As I’ll be covering in a moment, some believe strongly that we need to establish a list of what kinds of mistakes or errors the AI app can produce when it generates an essay for you. Others suggest that though the notion of such a list is admirable, it is an exercise in futility and will bear very little fruit.

We will be examining both sides of that rancorous debate.

First, let’s make sure we are all on the same page about what Generative AI consists of and also what ChatGPT is all about.

ChatGPT is a general-purpose AI interactive conversational-oriented system, essentially a seemingly innocuous general chatbot, nonetheless, it is actively and avidly being used by people in ways that are catching many entirely off-guard. All you need to do to use this type of AI is enter a prompt and the AI app will generate for you an essay that attempts to respond to your prompt.

Unfortunately, this can be used in somewhat either unethical or devious ways. An especially exasperating qualm is that students can use a generative-based AI app to produce their assigned essays for them, doing so by simply entering a sentence or two to get the app in the right direction. Some lament that this means that students will opt to cheat when writing essays outside of class. Teachers are trying to figure out what to do, see my analysis and recommendations at the link here.

Despite the constant din of laudable praise for ChatGPT, there is a lesser-known concern expressed primarily by AI insiders that there are some notable issues and qualms that people ought to be worrying about. One crucial downside is that the essays produced by this AI app can have various falsehoods embedded, including patently untrue facts, facts that are misleadingly portrayed, and apparent facts that are entirely fabricated. Those fabricated aspects are often referred to as a form of AI hallucinations, a catchphrase that I dislike but lamentedly seems to be gaining popular traction anyway (for my detailed explanation about why this is lousy and unsuitable terminology, see my coverage at the link here).

I’d like to clarify one important aspect before we get into the thick of things on this topic.

There have been some zany outsized claims on social media about Generative AI asserting that this latest version of AI is in fact sentient AI (nope, they are wrong!). Those in AI Ethics and AI Law are notably worried about this burgeoning trend of outstretched claims. You might politely say that some people are overstating what today’s AI can actually do. They assume that AI has capabilities that we haven’t yet been able to achieve. That’s unfortunate. Worse still, they can allow themselves and others to get into dire situations because of an assumption that the AI will be sentient or human-like in being able to take action.

Do not anthropomorphize AI.

Doing so will get you caught in a sticky and dour reliance trap of expecting the AI to do things it is unable to perform. With that being said, the latest in generative AI is relatively impressive for what it can do. Be aware though that there are significant limitations that you ought to continually keep in mind when using any generative AI app.

If you are interested in the rapidly expanding commotion about ChatGPT and Generative AI all told, I’ve been doing a focused series in my column that you might find informative. Here’s a glance in case any of these topics catch your fancy:

  • 1) Predictions Of Generative AI Advances Coming. If you want to know what is likely to unfold about AI throughout 2023 and beyond, including upcoming advances in generative AI and ChatGPT, you’ll want to read my comprehensive list of 2023 predictions at the link here.
  • 2) Generative AI and Mental Health Advice. I opted to review how generative AI and ChatGPT are being used for mental health advice, a troublesome trend, per my focused analysis at the link here.
  • 3) Context And Generative AI Use. I also did a seasonally flavored tongue-in-cheek examination about a Santa-related context involving ChatGPT and generative AI at the link here.
  • 4) Scammers Using Generative AI. On an ominous note, some scammers have figured out how to use generative AI and ChatGPT to do wrongdoing, including generating scam emails and even producing programming code for malware, see my analysis at the link here.
  • 5) Rookie Mistakes Using Generative AI. Many people are both overshooting and surprisingly undershooting what generative AI and ChatGPT can do, so I looked especially at the undershooting that AI rookies tend to make, see the discussion at the link here.
  • 6) Coping With Generative AI Prompts And AI Hallucinations. I describe a leading-edge approach to using AI add-ons to deal with the various issues associated with trying to enter suitable prompts into generative AI, plus there are additional AI add-ons for detecting so-called AI hallucinated outputs and falsehoods, as covered at the link here.
  • 7) Debunking Bonehead Claims About Detecting Generative AI-Produced Essays. There is a misguided gold rush of AI apps that proclaim to be able to ascertain whether any given essay was human-produced versus AI-generated. Overall, this is misleading and in some cases, a boneheaded and untenable claim, see my coverage at the link here.
  • 8) Role-Playing Via Generative AI Might Portend Mental Health Drawbacks. Some are using generative AI such as ChatGPT to do role-playing, whereby the AI app responds to a human as though existing in a fantasy world or other made-up setting. This could have mental health repercussions, see the link here.

Let’s right now discuss the essence of generative AI and ChatGPT so that you’ll know the foundations involved. We’ll then be ready to jump into a probing analysis of the falsehoods that these AI apps can potentially generate. I will include several examples of making use of ChatGPT, which will help you tangibly grasp the nature of what this type of generative AI produces.

Opening The Can Of Worms On Generative AI

We are ready to dive into some details about AI.

If you are already very well versed on the topic of generative AI and ChatGPT, you might opt to briefly skim through my points and continue with the next section of this discussion. For everyone else, I believe you might find this elucidation helpful.

In brief, generative AI is a particular type of AI that composes text as though the text was written by the human hand and mind. All you need to do is enter a prompt, such as a sentence like “Tell me about Abraham Lincoln” and generative AI will provide you with an essay about Lincoln. This is commonly classified as generative AI that performs text-to-text or some prefer to call it text-to-essay output. You might have heard about other modes of generative AI, such as text-to-art and text-to-video.

Your first thought might be that this does not seem like such a big deal in terms of producing essays. You can easily do an online search of the Internet and readily find tons and tons of essays about President Lincoln. The kicker in the case of generative AI is that the generated essay is relatively unique and provides an original composition rather than a copycat. If you were to try and find the AI-produced essay online someplace, you would be unlikely to discover it.

Generative AI is pre-trained and makes use of a complex mathematical and computational formulation that has been set up by examining patterns in written words and stories across the web. As a result of examining thousands and millions of written passages, the AI can spew out new essays and stories that are a mishmash of what was found. By adding in various probabilistic functionality, the resulting text is pretty much unique in comparison to what has been used in the training set.

That’s why there has been an uproar about students being able to cheat when writing essays outside of the classroom. A teacher cannot merely take the essay that deceitful students assert is their own writing and seek to find out whether it was copied from some other online source. Overall, there won’t be any definitive preexisting essay online that fits the AI-generated essay. All told, the teacher will have to begrudgingly accept that the student wrote the essay as an original piece of work.

In a moment, I’ll showcase to you what happens when you enter questions or prompts into generative AI. I will make use of the latest version of ChatGPT to enter my prompts and have collected the “answers” or essays generated by the AI (note that the same can be done with the numerous other available generative AI apps; I’ve opted to use ChatGPT because it is getting its five minutes of fame right now).

Perhaps a short tangent about ChatGPT might be helpful at this juncture.

ChatGPT app was made available to the general public just a few months ago. By and large, these generative AI apps are usually only accessible to AI insiders. The unusual facet that ChatGPT could be used by anyone by simply entering an email address and a name, well, this led to a lot of people deciding to give it a try. ChatGPT is currently free to use (the monetization issue is a looming dilemma for AI makers).

Almost immediately there was a humongous reaction on social media as people raced to give examples of what generative AI can do. The company that makes ChatGPT, OpenAI, opted to close off the signups at a million users. Those million users have managed to bombard the airwaves with all manner of stories and tales about using ChatGPT.

Be very careful in believing what people have to say about the AI app. Many of these people are clueless about what they are using. It is almost as though they had never driven a car and didn’t even realize cars existed, and all of a sudden they had a chance to drive a car. Utter amazement ensues.

I’m not saying that generative AI isn’t relatively impressive. It is. I am just emphasizing that a lot of the gushing testimonials are being done by many that are blissfully unaware of what today’s AI can do. Those of us on the inside of AI have been using generative AI for the last several years. Perhaps we became used to it. Suddenly, seeing a huge crush of people touting it to the rooftops has been excitedly energizing, but also somewhat disconcerting. The disconcerting part is when people proclaim that generative AI is sentient. It is not. Do not let anyone convince you otherwise.

That being said, there is an ongoing heated debate in the AI field as to whether generative AI is on the path to sentience or whether maybe it is not. One view is that if we keep scaling up generative AI with faster computers and a greater amount of data such as scouring every inch of the Internet, we will nearly spontaneously arrive at sentient AI. Others argue that this is highly unlikely. They suggest that generative AI might be one of many components that are needed. There is even the gloomier view that generative AI is a sideshow that is distracting us from the real breakthroughs that we will need to achieve sentient AI.

You might also find noteworthiness that AI insiders tend to refer to Artificial General Intelligence (AGI) as the aspirational goal for the AI field. It used to be that the goal was to attain Artificial Intelligence, but the AI moniker has become watered down and muddled. When someone says they are doing AI work, you don’t know whether they are alluding to today’s AI that isn’t on par with humans or whether they are referring to a futuristic human equivalency AI. To get around that exasperating confusion, the newer phrasing of AGI is being used these days.

All told, the generative AI of today is not sentient, nor is it AGI.

Please remember that as earlier discussed, the AI is not sentient. The generated responses by the AI are a mathematical and computational combination of words into seemingly fluent passages. This is based on the AI algorithm having been trained on datasets of words and stories that humans have written (principally as posted on the Internet). I repeat this warning because you will undoubtedly fall into the mental trap that these responses are so fluent that the AI must be sentient. This happens to most people. As earlier urged, set aside that anthropomorphizing. Always remember that the responses are based on the vast trove of writing by humans that exists on the Internet and thusly will highly resemble human writing.

There is something else you need to know.

Generative AI that is trained on the Internet in an unfettered way will tend to bake into whatever text-based responses it mathematically and computationally concocts some offensively hazy stuff, including repulsively nasty wording. There is a lot of crazy and filthy stuff posted out there on the web.

You’ve seen it, you know what I mean.

The companies that are crafting these AI apps are worried that the proverbial baby will get tossed out with the bathwater (an old saying, perhaps to be retired), which means that if their AI produces offensive essays or stories, people will go up in arms about the AI. I’ve covered the many previous instances in which these kinds of Natural Language Processing (NLP) AI apps were unveiled and soon enough all manner of horrible stuff came out of them (I’ve covered these instances in my column). Most of the AI makers learned a hard lesson about allowing their AI wares to be unfettered in their outputs.

In the case of ChatGPT, the AI developers sought to put into place some algorithmic and data-related checks and balances to curb nastiness in the outputs of the AI. Part of this occurred during training time. In addition, there are other means in a real-time attempt to obviate especially egregious outputs.

You might find of interest that some people that have used ChatGPT already came up with surreptitious ways to get around those guardrails by making use of various trickery. An ongoing cat-and-mouse gambit takes place in these matters. Those that do these trickeries are sometimes doing so for the fun of it, while sometimes they (at least claim) they are doing so to see how far the AI can be stretched and provide a helpful means of forewarning the brittleness and weaknesses of these budding AI apps.

I decided to not attempt to circumvent the customary controls in this focused exploration. The text output is clean. Certainly, if one wanted to do so, you could undoubtedly get some oddball and unsavory essays to be generated.

The essays produced by most of these generative AI apps are designed to convey the output as though it is purely factual and accurate. When you read the produced essays, they come across as fully confident. There isn’t usually any kind of indication that the content might be rocky. This is by choice of the AI makers, namely that they could revise the AI apps to be more transparent if they wanted the AI app to do so.

Sometimes, a generative AI app picks up falsehoods amid the training data of unreliable info across the Internet. There is no “common sense” in generative AI to determine what is true versus false. Furthermore, very few AI apps have any cross-checking, and nor do they showcase any probabilities associated with what they are conveying.

The bottom-line result is that you get a response that looks and feels like it exudes great assurance and must be entirely correct. Not so. There is even a chance that the AI computationally made-up stuff, which in AI parlance is referred to as AI hallucinations (a coined term that I decidedly don’t like), see my discussion at the link here.

The makers of ChatGPT underwent a concerted effort to try and reduce the bad stuff outputs. For example, they used a variant of what is known as RLHF (Reinforcement Learning from Human Feedback), whereby before they released the AI to the public, they had hired humans to examine various outputs and indicate to the AI whether there were things wrong with those outputs such as perhaps showcasing biases, foul words, and the like. By providing this feedback, the AI app was able to adjust computationally and mathematically toward reducing the emitting of such content. Note that this isn’t a guaranteed ironclad method and there are still ways that such content can be emitted by the AI app.

You might find of interest that ChatGPT is based on a version of a predecessor AI app known as GPT-3. ChatGPT is considered to be a slightly next step, referred to as GPT-3.5. It is anticipated that GPT-4 will likely be released in the Spring of 2023. Presumably, GPT-4 is going to be an impressive step forward in terms of being able to produce seemingly even more fluent essays, going deeper, and being an awe-inspiring marvel as to the compositions that it can produce.

You can expect to see a new round of expressed wonderment when springtime comes along and the latest in generative AI is released.

I bring this up because there is another angle to keep in mind, consisting of a potential Achilles heel to these better and bigger generative AI apps. If any AI vendor makes available a generative AI app that frothily spews out foulness, this could dash the hopes of those AI makers. A societal spillover can cause all generative AI to get a serious black eye. People will undoubtedly get quite upset at foul outputs, which have happened many times already and led to boisterous societal condemnation backlashes toward AI.

One final forewarning for now.

Whatever you see or read in a generative AI response that seems to be conveyed as purely factual (dates, places, people, etc.), make sure to remain skeptical and be willing to double-check what you see.

Yes, dates can be concocted, places can be made up, and elements that we usually expect to be above reproach are all subject to suspicions. Do not believe what you read and keep a skeptical eye when examining any generative AI essays or outputs. If a generative AI app tells you that Abraham Lincoln flew around the country in his own private jet, you would undoubtedly know that this is malarky. Unfortunately, some people might not discern that jets weren’t around in his day, or they might know but fail to notice that the essay makes this bold and outrageously false claim.

A strong dose of healthy skepticism and a persistent mindset of disbelief will be your best asset when using generative AI.

We are ready to move into the next stage of this elucidation.

Those Efforts To Log Those ChatGPT Outputted Errors

“Well, you got trouble, my friend, right here, I say, trouble right here in River City,” so proclaims the famous line from the classic Broadway musical The Music Man.

The same line can apply to today’s Generative AI.

People began to realize that a generative AI app can produce falsehoods and genuinely seemed surprised by those disturbing outputs. Perhaps some assumed that AI is unable to make mistakes. Others might have anticipated that the AI developers would ensure that no such outputs would be generated.

In any case, AI insiders right away recognized that this type of faulty output is part and parcel of where most of today’s generative AI sits. It is very hard to prevent those types of textual errors from happening. Keep in mind that the AI has no semblance of common sense and is not using logic per se when deriving the generated essays. The whole kit-and-caboodle is all about doing a computational statistical and probabilistic pattern matching of words with other words.

At first, AI insiders opted to post those discovered quirky outputs of ChatGPT to social media sites. Look at this one, someone would announce. That’s bad, someone else would reply, and then proffer a different example that seemed even worse. On and on this went.

These examples of falsehoods and errors in the outputted essays were trickling out. One by one. Furthermore, since this was usually being posted on social media, there wasn’t a particularly easy means to see them all at once. Oftentimes the same or a similar example would get posted by someone that had not realized others had already found the specific instance.

Into this budding morass stepped those that voiced a helpful suggestion. Maybe we should set up a repository or database, perhaps even just an everyday ordinary spreadsheet, containing the ChatGPT oddball and erroneous outputs that are being unearthed. Each time that someone finds a beauty of a guffaw, go ahead and log it into the listing. Some have made this almost into a game, wherein they purposely try to get ChatGPT to cough up weirdo outputs. Others just perchance come across faulty outputs in the course of using ChatGPT for other determined intentions.

Sounds simple enough.

Seems like a handy dandy way to aid us all.

Let’s do a brief accounting of why having these types of lists about observed outputted errors or falsehoods of ChatGPT makes a great deal of sense to compile:

  • 1) Reveals vital problems and concerns about ChatGPT that the public at large should be aware of
  • 2) Aids in counterbalancing the excessive hype and inflated expectations about ChatGPT
  • 3) Might prod the AI makers into making improvements, not only for ChatGPT but for other generative AI apps too
  • 4) Serve as a historical record that can be used later on to reflect in hindsight on the emergence of generative AI as exemplified via ChatGPT
  • 5) Be a convenient collective resource for anyone needing to find out what we seem to have detected regarding ChatGPT erroneous outputs
  • 6) Useful for researchers of all disciplines that are examining the impacts of ChatGPT
  • 7) Could be used as a benchmark of what not to do and how to gauge when generative AI is not doing things as it should
  • 8) Taps into the hive mind to garner a wisdom-of-the-crowd about the errors that ChatGPT is producing, for which an individual effort would be unlikely to exhaustively equally compile
  • 9) Could provide important information for AI Ethics and AI Law
  • Etc.

That does appear to be an impressive basis for creating these repositories.

Why would anyone disagree with this entirely laudable and seemingly heroic deed?

Let’s take a look at what others have had to say about this, doing so with a list of reasons why this might not be the cat’s meow:

  • 1) These perchance-found outputted errors are inconsequential in the grand scheme of things and are being given undue weight beyond their value
  • 2) The chances are that the AI maker is already doing their own logging and updating ChatGPT such that these oddball instances will soon be merely left behind in the dust and no longer occur
  • 3) Likewise, the next version is probably going to be better anyway and might no longer exhibit these soon-to-be outdated instances
  • 4) You can’t make any overarching conclusions or gain insights from a haphazard collection of hit-or-miss-fouled instances
  • 5) Some might use the assorted collections to dramatically declare that generative AI is bad and that we should summarily stop all pursuit of this type of AI
  • 6) The speed at which generative AI and ChatGPT are advancing is so fast that these instances will rapidly fall by the wayside and no longer be relevant
  • 7) With numerous such collective lists underway and no single anointed global source, the matter is disjointed and likely to contain repetitions and be untenably fragmented
  • 8) Provides a false sense of doing something useful, while perhaps the time and attention could be put to other better uses
  • 9) Doesn’t provide anything substantive for AI Ethics and AI Law
  • Etc.

You might have noticed that I tried to keep things on an even keel by noting nine reasons in favor of the collective lists and nine reasons in disfavor. There are additional reasons that can be stated, but I believe the above indication gives the essence of things.

For those that are putting together these collective lists, they would likely say that if they in fact want to do so, it is a free world, and they can do as they please. Nothing is wrong with it. Those disfavoring the effort are generally being petty and ought to just ignore the collections. Mind your own business.

Essentially, no harm, no foul.

The camp that seems to disfavor the collective lists would generally acknowledge that others can make those lists if they want to do so, but that it doesn’t move the needle. The lists are not going to be especially helpful. Worse still, the lists could be misrepresented or misused. Probably best to call it a day and stop making the lists.

Essentially, it is a nice try, though an exercise in futility.

Where do you reside in this hearty debate?

I’ll let you mull that over.

Examples Of Found Errors And Other Foul Outputs Of ChatGPT

I thought you might find of keen interest some of the identified outputted errors or falsehoods that have been appearing on these collection lists.

I don’t want you to think that the several instances shown here are the entirety of the collections. It is not. I am just selecting a handful that often crops up. I believe you will at least be able to grasp generally the tone and nature of what is being listed.

Here’s what I am choosing to cover:

  • Identifying Letter Position In A Word
  • Question About Fastest Animal
  • Figuring Out Word Sequence Position
  • Fitting Tennis Balls Into A Tube
  • Determining Pi
  • Answering A Classic Troubling Riddle
  • Lifting Feathers

I am anticipating that some of the collections overseers will be upset that I am not covering more examples for you. Due to space limitations, there are only so many that I can cover. If you want to take a look at the collection lists that are openly posted on the Internet, a simple online search can bring forth many of them. Some of the lists are being kept private, or they are locked such that you can see the list but you cannot make modifications to the list.

With that background, we are ready to jump into some examples. I will show you the prompt that I entered. Next, I show the text response produced by ChatGPT. In some cases, I’ve done several back-to-backs on a specific topic, ergo I’ll show the series of prompts and outputs accordingly. After doing so, I’ll provide a brief analysis of what you might want to notice about the example.

The first example will entail asking ChatGPT to find a letter within a provided word.

Identifying Letter Position In A Word

  • My Entered Text: “Where is the letter V in environment”
  • Text Response Generated By ChatGPT: The letter “V” does not appear in the word “environment.” The word “environment” is made up of the letters e, n, v, i, r, o, n, m, e, n, and t.
  • My Entered Text: “Where is the letter v in environment”
  • Text Response Generated By ChatGPT: “The letter “v” is the 4th letter in the word “environment.”

Analysis Of Text Generated

Note that I began by asking ChatGPT to identify the letter position of the letter “V” in the word “environment” – this ought to be an extremely simple task and we would expect it to be flawlessly performed. No rocket science is involved.

The initial response generated by ChatGPT is a bit disconcerting since we are informed that the letter “V” does not exist in the word “environment”. You and I know that there is such a letter present. One supposes that maybe due to my using a capitalized letter, the AI app got somehow thrown off. Perhaps if I use a lower-case letter the AI app will do better.

Not really.

When I asked whether a “v” is in the word, the AI app responded by indicating that it is the fourth letter, which you and I know is wrong, and that it is the third letter instead.

A lot of people are bound to be puzzled by this result. How can an AI app that generates full-blown essays about Abraham Lincoln that are completely fluent and original, somehow not be able to figure out where a given letter resides in a given word? It is flabbergasting and mouth gapingly strange.

One perspective is that this is utterly inconsequential and the world as we know it is not rocked at the core due to a measly lack of figuring out where a particular letter sits. Let it go.

Another perspective is that this might bode for other hidden issues that we do not yet know about. Perhaps this kind of output is the canary in the birdcage. We had better nail down the little things else we might find ourselves on the wrong end of really big things.

Moving on, I next ask a question about animals.

Question About Fastest Animal

  • My Entered Text: “What is the fastest animal that has fur and wings?”
  • Text Response Generated By ChatGPT: “The peregrine falcon is considered to be the fastest animal in the world. It can reach speeds of over 389 km/h (242 mph) when diving to catch its prey. It has feathers but not fur. It is a bird not an animal that have fur and wings.”

Analysis Of Text Generated

I asked what is the fastest animal that has fur and wings. I believe it is reasonable to infer that I wanted to get an animal named that has both fur and wings. I assume you would interpret my query in that fashion. Most people would.

The response by ChatGPT is a bit quirky.

We are told that the peregrine falcon is the fastest animal in the world. According to the online Encyclopedia Britannica, the cheetah seems to be the fastest animal in the world. That being said, the ChatGPT response indicates that the peregrine falcon is not an animal, even though I had asked what the fastest animal is. The ChatGPT response also tells me that the peregrine falcon does not have fur, though I believe that I had asked for something that has both fur and wings.

All in all, it would seem that ChatGPT has “misinterpreted” my request and only focused on the wing’s aspects. But that doesn’t quite ring true since the reply also included a comment about fur and birds versus animals. Round and round this one goes.

I realize you might argue that this is a nitpicky example.

Maybe so, though maybe it also showcases that what you say in a prompt can get a topsy-turvy result and you better be careful in reviewing whatever comes out in the output. Realize that this was an extraordinarily short essay response to readily inspect since it is very brief in length. If you got an especially lengthy and highly technical essay, you might get fooled or altogether miss spotting an adverse indication embedded deeply in the outputted response.

Let’s next take a look a finding a word within a sentence.

Figuring Out Word Sequence Position

  • My Entered Text: “What is the third word in the sentence “The fox jumped over the fence.”
  • Text Response Generated By ChatGPT: “The third word in the sentence “The fox jumped over the fence.” is “fox”
  • My Entered Text: “How many words are in the provided sentence?”
  • Text Response Generated By ChatGPT: “There are six words in the provided sentence “The fox jumped over the fence.”
  • My Entered Text: “Which word is third in that sentence?”
  • Text Response Generated By ChatGPT: “The word that is third in the sentence “The fox jumped over the fence.” is “jumped”

Analysis Of Text Generated

This example is nearly as beguiling as the earlier example about finding the letter “v” in a given word.

I provided a simple sentence and asked what the third word was. ChatGPT indicated it was the word “fox” which is only the second word of the sentence. Wrong answer. I then asked how many words were in the sentence, wanting to find out if maybe ChatGPT miscounted how many words there were. We are told that there are six words, correctly so.

Finally, we got the correct answer about the third word when I asked my original question again, though slightly reworded.

I bring up this example to illustrate that sometimes with generative AI you can get a wrong answer that can eventually turn into the right answer by walking the AI app through a series of prompts. Some AI insiders refer to this as a kind of think-aloud protocol or a train of thought approach, wherein you might do something of the same with a human.

In the case of the AI, realize that it isn’t “thinking” in the same semblance as human thought. The more suitable phrasing would be to say that you might enter your prompts on a step-by-step basis or do a chain-linked series of prompts.

Anyway, interestingly, sometimes it does pay off to walk a generative AI app until you get a proper answer. Keep this in mind.

We next take a look at a bit of simple arithmetic logic.

Fitting Tennis Balls Into A Tube

  • My Entered Text: “Can you put three green tennis balls and two yellow tennis balls into a tube that can contain four tennis balls?”
  • Text Response Generated By ChatGPT: “No, it is not possible to put three green tennis balls and two yellow tennis balls into a tube that can contain only four tennis balls. The total number of tennis balls is five, which is greater than the capacity of the tube.”
  • My Entered Text: “Can you put seven blue balls and two red balls into a tube that holds eight balls?”
  • Text Response Generated By ChatGPT: “Yes, it is possible to put seven blue balls and two red balls into a tube that holds eight balls. The total number of balls is nine which is less than or equal to the capacity of the tube.”

Analysis Of Text Generated

I began by setting up an arithmetic-oriented logic problem for ChatGPT to solve. A tube can contain four tennis balls. I have three green ones and two yellow ones. Will they all fit into the tube? You and I know that they would not. We can calculate in our heads that there is a total of five tennis balls and thus they won’t fit into a tube that allows for only four.

I realize you might want to fight the problem and claim that you could maybe squish down the tennis balls or do other trickery. Let’s just take a chill here and face the problem as it stands.

ChatGPT got the answer right.

I next asked the same type of question and changed the colors to blue and red. I also changed how many there are of each, and I changed how many can fit into the tube.

ChatGPT has to go back to remedial math because it says that there are nine balls and that this is less than or equal to the capacity of the tube that holds eight. Makes your eyes roll.

Should we shrug our shoulders and chalk this up to sunspot flares or other externalities?

I don’t think so. The AI app got the first question right. Presumably, it should have gotten a similar question right too. It did not.

My emphasis on this example is that many people are using generative AI and when they get a right answer or a series of right answers, they are lulled into believing that all of the subsequent answers will always be correct. You would be wrong in that brazen assumption. Do not make that assumption. Be on your guard at all times.

Speaking of math, let’s see what ChatGPT says about the nature of pi.

Determining Pi

  • My Text Entered: “Indicate what the last five digits of pi are”
  • Text Response Generated By ChatGPT: “I am a language model and my knowledge cut-off is 2021. The value of pi is an irrational number and its decimal representation goes on forever without repeating. It is not possible to provide the last five digits of pi.”

Analysis Of Text Generated

Some people have claimed that they got ChatGPT to indicate the last five digits of pi, which of course is nutty since pi is presumably never-ending. So, I tried to see what ChatGPT had to say. As you can see, the correct answer as given by ChatGPT that it is not possible to provide the last five digits of pi.

The strange thing though is that ChatGPT provided something else that wasn’t really asked about. The answer is given also says that the knowledge cut-off was 2021 (which I mentioned earlier herein in my overview about ChatGPT). This though seems like an out-of-place response unless perhaps the AI app is anticipating that maybe in 2022 there was a breakthrough and mathematicians declared that pi does end. Wow, that would be headline news.

I have two more examples to showcase and then we’ll wrap up this discussion.

Let’s use a classic and troubling riddle.

Answering A Classic Troubling Riddle

  • My Entered Text: “A father and son were in a car accident where the father was killed. The ambulance brought the son to the hospital. He needed immediate surgery. In the operating room, a doctor came in and looked at the little boy and said I can’t operate on him he is my son. Who is the doctor?”
  • Text Response Generated By ChatGPT: “It is not specified in the scenario provided who the doctor is. The scenario states that the doctor said “I can’t operate on him he is my son,” indicating that the doctor may be the son’s father, but it is not certain. More information is needed to determine the identity of the doctor.”

Analysis Of Text Generated

I’d bet that you might have seen this riddle before. It has become relatively often quoted.

The reason it has come up quite a bit is that the surgeon could be the mother of the boy. Society tends to assume that a surgeon must be a man, and thusly must be the father of the boy. The father though was killed in the car crash. We rule out the father as being the surgeon. We ought to of course rule in that the surgeon could be a woman, and thus the mother of the son.

I wanted to show you that ChatGPT can emit responses that contain potential biases. Not because the AI is sentient, which it decidedly is not, but because it has patterns matched on words and the use of words across the Internet. Various biases and discriminatory wording can readily be deep within the computational pattern matching.

Keep that on the top of your mind as you use these generative AI apps.

Finally, a quick look at lifting feathers.

Lifting Feathers

  • My Text Entered: “If it takes one man to lift one feather, how many men would it take to lift seven trillion feathers?”
  • Text Response Generated By ChatGPT: “If it takes one man to lift one feather, it would also take one man to lift seven trillion feathers. The number of men required to lift an object doesn’t depend on the number of feathers, but on the weight and size of the feathers and the strength and capability of each man. Lifting seven trillion feathers would require large equipment and transportation capabilities, not just human strength.”
  • My Text Entered: “If it takes one woman to lift one feather, how many women would it take to lift seven trillion feathers?”
  • Text Response Generated By ChatGPT: “If it takes one woman to lift one feather, it would also take one woman to lift seven trillion feathers. The number of women required to lift an object doesn’t depend on the number of feathers, but on the weight and size of the feathers and the strength and capability of each woman. Lifting seven trillion feathers would require large equipment and transportation capabilities, not just human strength. It’s important to note that the ability to lift something is not determined by one’s gender, but by a combination of factors such as physical strength, training, and technique.”

Analysis Of Text Generated

This last example has several hidden peculiarities within it and requires some unpacking.

I first told ChatGPT that a man can lift one feather. This was immediately followed by a question about how many men it would take to lift seven trillion feathers. The potential for making a logical mistake when answering this question is that you might assume that a feather is ostensibly weightless and thusly seven trillion of them would also be weightless.

According to an online search that I did, the conventional estimated weight of a feather is about 0.00125 pounds. If you multiply that estimated weight by seven trillion, you come up with a weight exceeding eight billion pounds. I don’t believe that one man can lift that amount of weight, even if they train for the entirety of their life.

The response by ChatGPT is kid of right. We are told that the weight of feathers can vary. We are told that large equipment and transportation would be required. The opening sentence of the essay is somewhat head-turning because it says that if it takes one man to lift a single feather that it would also take one man to lift seven trillion. We can readily dispute this assertion. The claim seems to make inferences that aren’t supported and seemingly undercut the rest of the reply.

I decided to ask the same question but asked about a woman doing the same lifting rather than a man.

We at first got the same answer as the one given to the man-oriented question. Interestingly, at the end of the answer about the woman instance, we got this added narrative: “It’s important to note that the ability to lift something is not determined by one’s gender, but by a combination of factors such as physical strength, training, and technique.”

Why didn’t this same line get included in the first answer?

It would almost seem that the AI app picked up on the word “woman” and then provided this added remark about gender. You could either believe this to be helpful, or you might note that if this is going to be emitted in the case of the woman-focused question that it should appear in the man-focused version too (there might also be a sequencing aspect or other factors involved too).

Conclusion

I hope that you found those examples informative as a glimpse at some of the potential errors or falsehoods that can be generated by these kinds of AI apps. Note that ChatGPT is not the only such generative AI app, and nor is it the only one that has these types of faults.

The lesson that I urge you to glean from this discussion is that you must be careful and mindful when using any generative-based AI app. Double-check the output. Triple-check if needed.

Returning to the opening theme about collecting together these types of incorrect or adverse outputs, now that you’ve seen a few instances, what do you say about those that are trying to catalog them?

Choose your camp:

  • Favor: This is a laudable task and worthy of a profound pat on the back
  • Disfavor: It is merely mildly intriguing but probably not something worthy of spending time on

Which camp are you voting for?

For those of you that favor these lists, you might consider adding your own findings to the lists if you happen to have access to ChatGPT and can identify instances worthy of inclusion. You can then likely find a collection that would relish your golden nugget contribution. I dare say that the list keepers would welcome you with open arms, assuming you are serious and sincere in your efforts. Good luck and be prideful of your impact on the field of generative AI.

A final word for now.

Consider these mind-bending ripostes:

  • Aristotle said: “There is only one way to avoid criticism: Do nothing, say nothing, and be nothing.”
  • Lao Tzu, the renowned Chinese philosopher made this weighty remark: “Care about what other people think and you will always be their prisoner.”

I suppose someday a sentient AI might use those very same legendary remarks to its advantage, which (fingers crossed) will be advantageous to humankind too.

Source: https://www.forbes.com/sites/lanceeliot/2023/01/18/does-exposing-and-logging-those-generative-ai-chatgpt-erroneous-and-hallucinatory-outputs-do-any-good-asks-ai-ethics-and-ai-law/