Generative AI ChatGPT Can Disturbingly Gobble Up Your Private And Confidential Data, Forewarns AI Ethics And AI Law

Do you know what happens to your confidential data that you enter into a generative AI app such as … [+] ChatGPT? Words to the wise.

getty

Now you see your data, now you don’t.

Meanwhile, your precious data has become part of the collective, as it were.

I’m referring to an aspect that might be quite surprising to those of you that are eagerly and earnestly making use of the latest in Artificial Intelligence (AI). The data that you enter into an AI app is potentially not at all entirely private to you and you alone. It could be that your data is going to be utilized by the AI maker to presumably seek to improve their AI services or might be used by them and/or even their allied partners for a variety of purposes.

You have now been forewarned.

This handing over of your data is happening in the most innocuous of ways and by potentially thousands or on the order of millions of people. How so? There is a type of AI known as generative AI that has recently garnered big headlines and the rapt attention of the public at large. The most notable of the existing generative AI apps is one called ChatGPT which is devised by the firm OpenAI.

There are purportedly around a million registered users for ChatGPT. Many of those users seem to delight in trying out this hottest and latest generative AI app. The process is extraordinarily simple. You enter some text as a prompt, and voila, the ChatGPT app generates a text output that is usually in the form of an essay. Some refer to this as text-to-text, though I prefer to denote it as text-to-essay since this verbiage makes more everyday sense.

At first, a newbie user will likely enter something fun and carefree. Tell me about the life and times of George Washington, someone might enter as a prompt. ChatGPT then would produce an essay about our legendary first president. The essay would be entirely fluent and you would be hard-pressed to discern that it was produced by an AI app. An exciting thing to see happen.

The odds are that after playing around for a while, a segment of newbie users will have had their fill and potentially opt to stop toying with ChatGPT. They have now overcome their FOMO (fear of missing out), doing so after experimenting with the AI app that just about everyone seems to be chattering about. Deed done.

Some though will begin to think about other and more serious ways to use generative AI.

Maybe use ChatGPT to write that memo that your boss has been haranguing you to write. All you need to do is provide a prompt with the bullet points that you have in mind, and the next thing you know an entire memo has been generated by ChatGPT that would make your boss proud of you. You copy the outputted essay from ChatGPT, paste it into the company’s official template in your word processing package, and email the classy memorandum to your manager. You are worth a million bucks. And you used your brains to find a handy tool to do the hard work for you. Pat yourself on the back.

That’s not all.

Yes, there’s more.

Keep in mind that generative AI can perform a slew of other writing-related tasks.

For example, suppose you have written a narrative of some kind for a valued client and you dearly want to have a review done of the material before it goes out the door.

Easy-peasy.

You paste the text of your narrative into a ChatGPT prompt and then instruct ChatGPT to analyze the text that you composed. The resultant outputted essay might deeply dig into your wording, and to your pleasant surprise will attempt to seemingly inspect the meaning of what you have said (going far beyond acting as a spell checker or a grammar analyzer). The AI app might detect faults in the logic of your narrative or might discover contradictions that you didn’t realize were in your very own writing. It is almost as though you hired a crafty human editor to eyeball your draft and provide a litany of helpful suggestions and noted concerns (well, I want to categorically state that I am not trying to anthropomorphize the AI app, notably that a human editor is a human while the AI app is merely a computer program).

Thank goodness that you used the generative AI app to scrutinize your precious written narrative. You undoubtedly would prefer that the AI finds those disquieting written issues rather than after sending the document to your prized client. Imagine that you had composed the narrative for someone that had hired you to devise a quite vital depiction. If you had given the original version to the client, before doing the AI app review, you might suffer grand embarrassment. The client would almost certainly harbor serious doubts about your skills to do the work that was requested.

Let’s up the ante.

Consider the creation of legal documents. That’s obviously a particularly serious matter. Words and how they are composed can spell a spirited legal defense or a dismal legal calamity.

In my ongoing research and consulting, I interact regularly with a lot of attorneys that are keenly interested in using AI in the field of law. Various LegalTech programs are getting connected to AI capabilities. A lawyer can use generative AI to compose a draft of a contract or compose other legal documents. In addition, if the attorney made an initial draft themselves, they can pass the text over to a generative AI app such as ChatGPT to take a look and see what holes or gaps might be detected. For more about how attorneys and the legal field are opting to make use of AI, see my discussion at the link here.

We are ready though for the rub on this.

An attorney takes a drafted contract and copies the text into a prompt for ChatGPT. The AI app produces a review for the lawyer. Turns out that several gotchas are found by ChatGPT. The attorney revises the contract. They might also ask ChatGPT to suggest a rewording or redo of the composed text for them. A new and better version of the contract is then produced by the generative AI app. The lawyer grabs up the outputted text and plops it into a word processing file. Off the missive goes to their client. Mission accomplished.

Can you guess what also just happened?

Behind the scenes and underneath the hood, the contract might have been swallowed up like a fish into the mouth of a whale. Though this AI-using attorney might not realize it, the text of the contract, as placed as a prompt into ChatGPT, could potentially get gobbled up by the AI app. It now is fodder for pattern matching and other computational intricacies of the AI app. This in turn could be used in a variety of ways. If there is confidential data in the draft, that too is potentially now within the confines of ChatGPT. Your prompt as provided to the AI app is now ostensibly a part of the collective in one fashion or another.

Furthermore, the outputted essay is also considered part of the collective. If you had asked ChatGPT to modify the draft for you and present the new version of the contract, this is construed as an outputted essay. The outputs of ChatGPT are also a type of content that can be retained or otherwise transformed by the AI app.

Yikes, you might have innocently given away private or confidential information. Not good. Plus, you wouldn’t even be aware that you had done so. No flags were raised. A horn didn’t blast. No flashing lights went off to shock you into reality.

We might anticipate that non-lawyers could easily make such a mistake, but for a versed attorney to do the same rookie mistake is nearly unimaginable. Nonetheless, there are likely legal professionals right now making this same potential blunder. They risk violating a noteworthy element of the attorney-client privilege and possibly breaching the American Bar Association (ABA) Model Rules of Professional Conduct (MRPC). In particular: “A lawyer shall not reveal information relating to the representation of a client unless the client gives informed consent, the disclosure is impliedly authorized in order to carry out the representation or the disclosure is permitted by paragraph (b)” (cited from the MRPC, and for which the exceptions associated with subsection b would not seem to encompass using a generative AI app in a non-secure way).

Some attorneys might seek to excuse their transgression by claiming that they aren’t tech wizards and that they would have had no ready means to know that their entering of confidential info into a generative AI app might somehow be a breach of sorts. The ABA has made clear that a duty for lawyers encompasses being up-to-date on AI and technology from a legal perspective: “To maintain the requisite knowledge and skill, a lawyer should keep abreast of changes in the law and its practice, including the benefits and risks associated with relevant technology, engage in continuing study and education and comply with all continuing legal education requirements to which the lawyer is subject” (per MRPC).

Several provisions come into this semblance of legal duty, including maintaining client confidential information (Rule 1.6), protecting client property such as data (Rule 1.15), properly communicating with a client (Rule 1.4), obtaining client informed consent (Rule 1.6), and ensuring competent representation on behalf of a client (Rule 1.1). And there is also the little-known but highly notable AI-focused resolution passed by the ABA: “That the American Bar Association urges courts and lawyers to address the emerging ethical and legal issues related to the usage of artificial intelligence (‘AI’) in the practice of law including: (1) bias, explainability, and transparency of automated decisions made by AI; (2) ethical and beneficial usage of AI; and (3) controls and oversight of AI and the vendors that provide AI.”

Words to the wise for my legal friends and colleagues.

The crux of the matter is that just about anyone can get themselves into a jam when using generative AI. Non-lawyers can do so by their presumed lack of legal acumen. Lawyers can do so too, perhaps enamored of the AI or not taking a deep breath and reflecting on what legal repercussions can arise when using generative AI.

We are all potentially in the same boat.

You should also realize that ChatGPT is not the only generative AI app on the block. There are other generative AI apps that you can use. They too are likely cut from the same cloth, namely that the inputs you enter as prompts and the outputs you receive as generated outputted essays are considered part of the collective and can be used by the AI maker.

In today’s column, I am going to unpack the nature of how data that you enter and data that you receive from generative AI can be potentially compromised with respect to privacy and confidentiality. The AI makers make available their licensing requirements and you would be wise to read up on those vital stipulations before you start actively using an AI app with any semblance of real data. I will walk you through an example of such licensing, doing so for the ChatGPT AI app.

Into all of this comes a slew of AI Ethics and AI Law considerations.

Please be aware that there are ongoing efforts to imbue Ethical AI principles into the development and fielding of AI apps. A growing contingent of concerned and erstwhile AI ethicists are trying to ensure that efforts to devise and adopt AI takes into account a view of doing AI For Good and averting AI For Bad. Likewise, there are proposed new AI laws that are being bandied around as potential solutions to keep AI endeavors from going amok on human rights and the like. For my ongoing and extensive coverage of AI Ethics and AI Law, see the link here and the link here, just to name a few.

There are significant Ethical AI nuances and provisions associated with how AI makers can or should deal with the data or information that seems private or confidential to their users. You likely know too that a bunch of existing laws strike at the core of how data is supposed to be handled by technology entities. The chances too are that the newly proposed AI laws will also crisscross into that same territory. See for example my coverage of the AI Bill of Rights and other legal wranglings going on about AI, at the link here.

Here is the key takeaway from this discussion all told:

Be very, very, very careful about what data or information you opt to put into your prompts when using generative AI, and similarly be extremely careful and anticipate what kinds of outputted essays you might get since the outputs can also be absorbed too.

Does this imply that you should not use generative AI?

Nope, that’s not at all what I am saying.

Use generative AI to your heart’s content. The gist is that you need to be mindful of how you use it. Find out what kind of licensing stipulations are associated with the usage. Decide whether you can live with those stipulations. If there are avenues to inform the AI maker that you want to invoke certain kinds of added protections or allowances, make sure you do so.

I will also mention one other facet that I realize will get some people boiling mad. Here goes. Despite whatever the licensing stipulations are, you have to also assume that there is a possibility that those requirements might not be fully adhered to. Things can go awry. Stuff can slip between the cracks. In the end, sure, you might have a legal case against an AI maker for not conforming to their stipulations, but that’s somewhat after the horse is already out of the barn.

A potentially highly secure way to proceed would be to set up your own instance on your own systems, whether in the cloud or in-house (and, assuming that you adhere to the proper cybersecurity precautions, which admittedly some do not and they are worse off in their own cloud than using the cloud of the software vendor). A bit of a nagging problem though is that few of the generative AI large-scale apps allow this right now. They are all pretty much working on an our-cloud-only basis. Few have made available the option of having an entire instance carved out just for you. I’ve predicted that we will gradually see this option arising, though at first it will be rather costly and somewhat complicated, see my predictions at the link here.

How do otherwise especially bright and notably astute people get themselves into a data or information confidentiality erosion quagmire?

The allure of these generative AI apps is quite magnetic once you start using one. Step by step, you find yourself mesmerized and opting to put your toes further and further into the generative AI waters. The next thing you know, you are readily handing over proprietary content that is supposed to be kept private and confidential into a generative AI app.

Resist the urge and please refrain from stepwise falling into an unsavory trap.

For business leaders and top-level executives, the same warning goes to you and all of the people throughout your company. Senior execs get caught up in the enthusiasm and amazement of using generative AI too. They can really mess up and potentially enter top-level secret info into an AI app.

On top of this, they might have wide leagues of their employees also playing around with generative AI. Many of those otherwise mindful staff are mindlessly and blissfully entering the company’s private and confidential information into these AI apps. According to recent news reports, Amazon apparently discovered that some employees were entering various proprietary information into ChatGPT. A legal-oriented warning was said to have been sent internally to be cautious in making use of the irresistible AI app.

Overall, a bit of irony comes into the rising phenomena of employees willy-nilly entering confidential data into ChatGPT and other generative AI. Allow me to elaborate. Today’s modern companies typically have strict cybersecurity policies that they have painstakingly crafted and implemented. Numerous technological protections exist. The hope is to prevent accidental releases of crucial stuff. A continual drumbeat is to be careful when you visit websites, be careful when you use any non-approved apps, and so on.

Along comes generative AI apps such as ChatGPT. The news about the AI app goes through the roof and gets widespread attention. A frenzy arises. People in these companies that have all these cybersecurity protections opt to hop onto a generative AI app. They idly play with it at first. They then start entering company data. Wham, they have now potentially exposed information that should not have been disclosed.

The shiny new toy that magically circumvents the millions of dollars of expenditures on cybersecurity protections and ongoing training about what to not do. But, hey, it is exciting to use generative AI and be part of the “in” crowd. That’s what counts, apparently.

I trust that you get my drift about being markedly cautious.

Let’s next take a close-up look at how generative AI technically deals with the text of the prompts and outputted essays. We will also explore some of the licensing stipulations, using ChatGPT as an example. Please realize that I am not going to cover the full gamut of those licensing elements. Make sure to involve your legal counsel for whichever generative AI apps you might decide to use. Also, the licensing differs from AI maker to AI maker, plus a given AI maker can opt to change their licensing so make sure to remain vigilant on whatever the latest version of the licensing stipulates.

We have some exciting unpacking to do on this heady topic.

First, we ought to make sure that we are all on the same page about what Generative AI consists of and also what ChatGPT is all about. Once we cover that foundational facet, we can perform a cogent assessment of the mirror metaphor associated with this type of AI.

If you are already abundantly familiar with Generative AI and ChatGPT, you can perhaps skim the next section and proceed with the section that follows it. I believe that everyone else will find instructive the vital details about these matters by closely reading the section and getting up-to-speed.

A Quick Primer About Generative AI And ChatGPT

ChatGPT is a general-purpose AI interactive conversational-oriented system, essentially a seemingly innocuous general chatbot, nonetheless, it is actively and avidly being used by people in ways that are catching many entirely off-guard, as I’ll elaborate shortly. This AI app leverages a technique and technology in the AI realm that is often referred to as Generative AI. The AI generates outputs such as text, which is what ChatGPT does. Other generative-based AI apps produce images such as pictures or artwork, while others generate audio files or videos.

I’ll focus on the text-based generative AI apps in this discussion since that’s what ChatGPT does.

Generative AI apps are exceedingly easy to use.

All you need to do is enter a prompt and the AI app will generate for you an essay that attempts to respond to your prompt. The composed text will seem as though the essay was written by the human hand and mind. If you were to enter a prompt that said “Tell me about Abraham Lincoln” the generative AI will provide you with an essay about Lincoln. This is commonly classified as generative AI that performs text-to-text or some prefer to call it text-to-essay output. As mentioned, there are other modes of generative AI, such as text-to-art and text-to-video.

Your first thought might be that this generative capability does not seem like such a big deal in terms of producing essays. You can easily do an online search of the Internet and readily find tons and tons of essays about President Lincoln. The kicker in the case of generative AI is that the generated essay is relatively unique and provides an original composition rather than a copycat. If you were to try and find the AI-produced essay online someplace, you would be unlikely to discover it.

Generative AI is pre-trained and makes use of a complex mathematical and computational formulation that has been set up by examining patterns in written words and stories across the web. As a result of examining thousands and millions of written passages, the AI can spew out new essays and stories that are a mishmash of what was found. By adding in various probabilistic functionality, the resulting text is pretty much unique in comparison to what has been used in the training set.

That’s why there has been an uproar about students being able to cheat when writing essays outside of the classroom. A teacher cannot merely take the essay that deceitful students assert is their own writing and seek to find out whether it was copied from some other online source. Overall, there won’t be any definitive preexisting essay online that fits the AI-generated essay. All told, the teacher will have to begrudgingly accept that the student wrote the essay as an original piece of work.

There are additional concerns about generative AI.

One crucial downside is that the essays produced by a generative-based AI app can have various falsehoods embedded, including patently untrue facts, facts that are misleadingly portrayed, and apparent facts that are entirely fabricated. Those fabricated aspects are often referred to as a form of AI hallucinations, a catchphrase that I disfavor but lamentedly seems to be gaining popular traction anyway (for my detailed explanation about why this is lousy and unsuitable terminology, see my coverage at the link here).

I’d like to clarify one important aspect before we get into the thick of things on this topic.

There have been some zany outsized claims on social media about Generative AI asserting that this latest version of AI is in fact sentient AI (nope, they are wrong!). Those in AI Ethics and AI Law are notably worried about this burgeoning trend of outstretched claims. You might politely say that some people are overstating what today’s AI can actually do. They assume that AI has capabilities that we haven’t yet been able to achieve. That’s unfortunate. Worse still, they can allow themselves and others to get into dire situations because of an assumption that the AI will be sentient or human-like in being able to take action.

Do not anthropomorphize AI.

Doing so will get you caught in a sticky and dour reliance trap of expecting the AI to do things it is unable to perform. With that being said, the latest in generative AI is relatively impressive for what it can do. Be aware though that there are significant limitations that you ought to continually keep in mind when using any generative AI app.

If you are interested in the rapidly expanding commotion about ChatGPT and Generative AI all told, I’ve been doing a focused series in my column that you might find informative. Here’s a glance in case any of these topics catch your fancy:

1) Predictions Of Generative AI Advances Coming. If you want to know what is likely to unfold about AI throughout 2023 and beyond, including upcoming advances in generative AI and ChatGPT, you’ll want to read my comprehensive list of 2023 predictions at the link here.
2) Generative AI and Mental Health Advice. I opted to review how generative AI and ChatGPT are being used for mental health advice, a troublesome trend, per my focused analysis at the link here.
3) Fundamentals Of Generative AI And ChatGPT. This piece explores the key elements of how generative AI works and in particular delves into the ChatGPT app, including an analysis of the buzz and fanfare, at the link here.
4) Tension Between Teachers And Students Over Generative AI And ChatGPT. Here are the ways that students will deviously use generative AI and ChatGPT. In addition, there are several ways for teachers to contend with this tidal wave. See the link here.
5) Context And Generative AI Use. I also did a seasonally flavored tongue-in-cheek examination about a Santa-related context involving ChatGPT and generative AI at the link here.
6) Scammers Using Generative AI. On an ominous note, some scammers have figured out how to use generative AI and ChatGPT to do wrongdoing, including generating scam emails and even producing programming code for malware, see my analysis at the link here.
7) Rookie Mistakes Using Generative AI. Many people are both overshooting and surprisingly undershooting what generative AI and ChatGPT can do, so I looked especially at the undershooting that AI rookies tend to make, see the discussion at the link here.
8) Coping With Generative AI Prompts And AI Hallucinations. I describe a leading-edge approach to using AI add-ons to deal with the various issues associated with trying to enter suitable prompts into generative AI, plus there are additional AI add-ons for detecting so-called AI hallucinated outputs and falsehoods, as covered at the link here.
9) Debunking Bonehead Claims About Detecting Generative AI-Produced Essays. There is a misguided gold rush of AI apps that proclaim to be able to ascertain whether any given essay was human-produced versus AI-generated. Overall, this is misleading and in some cases, a boneheaded and untenable claim, see my coverage at the link here.
10) Role-Playing Via Generative AI Might Portend Mental Health Drawbacks. Some are using generative AI such as ChatGPT to do role-playing, whereby the AI app responds to a human as though existing in a fantasy world or other made-up setting. This could have mental health repercussions, see the link here.
11) Exposing The Range Of Outputted Errors and Falsehoods. Various collected lists are being put together to try and showcase the nature of ChatGPT-produced errors and falsehoods. Some believe this is essential, while others say that the exercise is futile, see my analysis at the link here.
12) Schools Banning Generative AI ChatGPT Are Missing The Boat. You might know that various schools such as the New York City (NYC) Department of Education have declared a ban on the use of ChatGPT on their network and associated devices. Though this might seem a helpful precaution, it won’t move the needle and sadly entirely misses the boat, see my coverage at the link here.
13) Generative AI ChatGPT Is Going To Be Everywhere Due To The Upcoming API. There is an important twist coming up about the use of ChatGPT, namely that via the use of an API portal into this particular AI app, other software programs will be able to invoke and utilize ChatGPT. This is going to dramatically expand the use of generative AI and has notable consequences, see my elaboration at the link here.
14) Ways That ChatGPT Might Fizzle Or Melt Down. Several potential vexing issues lay ahead of ChatGPT in terms of undercutting the so far tremendous praise it has received. This analysis closely examines eight possible problems that could cause ChatGPT to lose its steam and even end up in the doghouse, see the link here.

You might find of interest that ChatGPT is based on a version of a predecessor AI app known as GPT-3. ChatGPT is considered to be a slightly next step, referred to as GPT-3.5. It is anticipated that GPT-4 will likely be released in the Spring of 2023. Presumably, GPT-4 is going to be an impressive step forward in terms of being able to produce seemingly even more fluent essays, going deeper, and being an awe-inspiring marvel as to the compositions that it can produce.

You can expect to see a new round of expressed wonderment when springtime comes along and the latest in generative AI is released.

I bring this up because there is another angle to keep in mind, consisting of a potential Achilles heel to these better and bigger generative AI apps. If any AI vendor makes available a generative AI app that frothily spews out foulness, this could dash the hopes of those AI makers. A societal spillover can cause all generative AI to get a serious black eye. People will undoubtedly get quite upset at foul outputs, which have happened many times already and led to boisterous societal condemnation backlashes toward AI.

One final forewarning for now.

Whatever you see or read in a generative AI response that seems to be conveyed as purely factual (dates, places, people, etc.), make sure to remain skeptical and be willing to double-check what you see.

Yes, dates can be concocted, places can be made up, and elements that we usually expect to be above reproach are all subject to suspicions. Do not believe what you read and keep a skeptical eye when examining any generative AI essays or outputs. If a generative AI app tells you that Abraham Lincoln flew around the country in his own private jet, you would undoubtedly know that this is malarky. Unfortunately, some people might not realize that jets weren’t around in his day, or they might know but fail to notice that the essay makes this brazen and outrageously false claim.

A strong dose of healthy skepticism and a persistent mindset of disbelief will be your best asset when using generative AI.

We are ready to move into the next stage of this elucidation.

Knowing What The Devil Will Happen With That Text

Now that we’ve got the fundamentals established, we can dive into the data and information considerations when using generative AI.

First, let’s briefly consider what happens when you enter some text into a prompt for ChatGPT. We don’t know for sure what is happening inside ChatGPT since the program is considered proprietary. Some have pointed out that this undercuts a sense of transparency about the AI app. A somewhat smarmy remark is that for a company that is called OpenAI, their AI is actually closed to public access and not available as open source.

Let’s discuss tokenization.

When you enter plain text into a prompt and hit return, there is presumably a conversion that right away happens. The text is converted into a format consisting of tokens. Tokens are subparts of words. For example, the word “hamburger” would normally be divided into three tokens consisting of the portion “ham”, “bur”, and “ger”. A rule of thumb is that tokens tend to represent about four characters or are considered approximately 75% of a conventional English word.

Each token is then reformulated as a number. Various internal tables designate which token is assigned to which particular number. The uptake on this is that the text that you entered is now entirely a set of numbers. Those numbers are used to computationally analyze the prompt. Furthermore, the pattern-matching network that I mentioned earlier is also based on tokenized values. Ultimately, when composing or generating the outputted essay, these numeric tokens are first used, and then before being displayed, the tokens are converted back into sets of letters and words.

Think about that for a moment.

When I tell people that this is how the mechanics of the processing work, they are often stunned. They assumed that a generative AI app such as ChatGPT must use wholly integrative words. We logically assume that words act as the keystone for statistically identifying relationships in written narratives and compositions. Turns out that the processing actually tends to use tokens. Perhaps this adds to the amazement over how the computational process seems to do quite a convincing job of mimicking human language.

I walked you through that process due to one common misconception that seems to be spreading around. Some people appear to believe that because your prompt text is being converted into numeric tokens, you are safe and sound that the internals of the AI app somehow no longer have your originally entered text. Thus, the claim goes, even if you entered confidential info in your prompt, you have no worries since it has all been seemingly tokenized.

That notion is a fallacy. I’ve just pointed out that numeric tokens can be readily brought back into the textual format of letters and words. The same could be done with the converted prompt that has been tokenized. There is nothing magically protective about having been tokenized. That being said, after the conversion into tokens, if there is an additional process that opts to drop out tokens, move them around, and otherwise scramble or chop up things, in that case, there is indeed the possibility that some portions of the original prompt are no longer intact (and assuming that an original copy isn’t otherwise retained or stored someplace internally).

I’d like to next take a look at the various notifications and licensing stipulations of ChatGPT.

When you log onto ChatGPT, there are a series of cautions and informational comments displayed.

Here they are:

“May occasionally generate incorrect information.”
“May occasionally produce harmful instructions or biased content.”
“Trained to decline inappropriate requests.”
“Our goal is to get external feedback in order to improve our systems and make them safer.”
“While we have safeguards in place, the system may occasionally generate incorrect or misleading information and produce offensive or biased content. It is not intended to give advice.”
“Conversations may be reviewed by our AI trainers to improve our systems.”
“Please don’t share any sensitive information in your conversations.”
“This system is optimized for dialogue. Let us know if a particular response was good or unhelpful.”
“Limited knowledge of world and events after 2021.”

Two of those stated cautions are especially relevant to this discussion. Look at the sixth bulleted point and the seventh bulleted point.

Let’s unpack those two:

“Conversations may be reviewed by our AI trainers to improve our systems.”

This sixth bulleted point explains that text conversations when using ChatGPT might be reviewed by ChatGPT via its “AI trainers” which is being done to improve their systems. This is to inform you that for any and all of your entered text prompts and the corresponding outputted essays, all of which are part of the “conversation” that you undertake with ChatGPT, it can entirely be seen by their people. The rationale proffered is that this is being done to improve the AI app, and we are also told that it is a type of work task being done by their AI trainers. Maybe so, but the upshot is that they have put you on notice that they can look at your text. Period, full stop.

If they were to do something else with your text, you would probably seek legal advice about whether they have gravitated egregiously beyond the suggested confines of merely reviewing the text for system improvement purposes (assuming you managed to discover that they had done so, which of itself seems perhaps unlikely). Anyway, you can imagine the legal wrangling of trying to pin them down on this, and their attempts to wordsmith their way out of being nabbed for somehow violating the bounds of their disclaimer.

“Please don’t share any sensitive information in your conversations.”

The seventh bulleted point indicates that you are not to share any sensitive information in your conversations. That seems relatively straightforward. I suppose you might quibble with what the definition of sensitive information consists of. Also, the bulleted point doesn’t tell you why you should not share any sensitive information. If you someday have to try and in a dire sweat explain why you foolishly entered confidential data, you might try the raised eyebrow claim that the warning was non-specific, therefore, you didn’t grasp the significance. Hold your breath on that one.

All in all, I dare say that most people that I’ve seen using ChatGPT tend to not read the bulleted points, or they skim the bulleted precautions and just nod their head as though it is the usual gibberish legalese that you see all of the time. Few it seems take the warnings strictly to heart. Is this a fault of the vendor for not making the precautions more pronounced? Or should we assume that the users should be responsible and have mindfully read, comprehended, and subsequently act judiciously based on the warnings?

Some even claim that the AI app ought to repeatedly warn you. Each time that you enter a prompt, the software should pop up a warning and ask you whether you want to hit the return. Over and over again. Though this might seem like a helpful precaution, admittedly it would irritate the heck out of users. A thorny tradeoff is involved.

Okay, so those are the obvious cautions as presented for all users to readily see.

Users that might be more inquisitive, could opt to pursue some of the detailed licensing stipulations that are also posted online. I doubt that many do so. My hunch is that few look seriously at the bulleted points when logging in, and even fewer by a huge margin then take a look at the licensing details. Again, we are all somewhat numb to such things these days. I’m not excusing the behavior, only noting why it occurs.

I’ll examine a few excerpts from the posted licensing terms.

First, here’s a definition of what they consider “content” associated with the use of ChatGPT:

“Your Content. You may provide input to the Services (‘Input’), and receive output generated and returned by the Services based on the Input (‘Output’). Input and Output are collectively “Content.” As between the parties and to the extent permitted by applicable law, you own all Input, and subject to your compliance with these Terms, OpenAI hereby assigns to you all its right, title and interest in and to Output. OpenAI may use Content as necessary to provide and maintain the Services, comply with applicable law, and enforce our policies. You are responsible for Content, including for ensuring that it does not violate any applicable law or these Terms.”

If you carefully examine that definition, you’ll notice that OpenAI declares that it can use the content as they deem necessary to maintain its services, including complying with applicable laws and enforcing its policies. This is a handy catchall for them. In an upcoming one of my columns, I’ll be discussing a different but related topic, specifically about the Intellectual Property (IP) rights that you have regarding the entered text prompts and outputted essays (I point this out herein since the definition of the Content bears on that topic).

In a further portion of the terms, labeled as section c, they mention this facet: “One of the main benefits of machine learning models is that they can be improved over time. To help OpenAI provide and maintain the Services, you agree and instruct that we may use Content to develop and improve the Services.” This is akin to the earlier discussed one-line caution that appears when you log into ChatGPT.

A separate document that is linked to this provides some additional aspects on these weighty matters:

“As part of this continuous improvement, when you use OpenAI models via our API, we may use the data you provide us to improve our models. Not only does this help our models become more accurate and better at solving your specific problem, it also helps improve their general capabilities and safety. We know that data privacy and security are critical for our customers. We take great care to use appropriate technical and process controls to secure your data. We remove any personally identifiable information from data we intend to use to improve model performance. We also only use a small sampling of data per customer for our efforts to improve model performance. For example, for one task, the maximum number of API requests that we sample per customer is capped at 200 every 6 months” (excerpted from the document entitled “How your data is used to improve model performance”).

Note that the stipulation indicates that the provision applies to the use of the API as a means of connecting to and using the OpenAI models all told. It is somewhat murky as to whether this equally applies to end users that are directly using ChatGPT.

In yet a different document, one that contains their list of various FAQs, they provide a series of questions and answers, two of which seem especially pertinent to this discussion:

“(5) Who can view my conversations? As part of our commitment to safe and responsible AI, we review conversations to improve our systems and to ensure the content complies with our policies and safety requirements.”
“(8) Can you delete specific prompts? No, we are not able to delete specific prompts from your history. Please don’t share any sensitive information in your conversations.”

There is an additional document that covers their privacy policy. It says this: “We collect information that alone or in combination with other information in our possession could be used to identify you (“Personal Information”)” and then proceeds to explain that they might use log data, usage data, communication information, device information, cookies, analytics, and other potentially collectible information about you. Make sure to read the fine print.

I think that pretty much provides a tour of some considerations underlying how your data might be used. As I mentioned at the outset, I am not going to laboriously step through all of the licensing stipulations.

Hopefully, this gets you into a frame of mind on these matters and will remain on top of your mind.

Conclusion

I’ve said it before and I’ll say it again, do not enter confidential or private data into these generative AI apps.

Consider a few handy tips or options on this sage piece of advice:

Think Before Using Generative AI
Remove Stuff Beforehand
Mask Or Fake Your Input
Setup Your Own Instance
Other

I’ll indicate next what each one of those consists of. The setting up of your own instance was earlier covered herein. The use of “other” in my list is due to the possibility of other ways to cope with preventing confidential data from getting included, which I will be further covering in a future column posting.

Let’s examine these:

Think Before Using Generative AI. One approach involves avoiding using generative AI altogether. Or at least think twice before you do so. I suppose the safest avenue involves not using these AI apps. But this also seems quite severe and nearly overboard.
Remove Stuff Beforehand. Another approach consists of removing confidential or private information from whatever you enter as a prompt. In that sense, if you don’t enter it, there isn’t a chance of it getting infused into the Borg. The downside is that maybe the removal of the confidential portion somehow reduces or undercuts what you are trying to get the generative AI to do for you.
Mask Or Fake Your Inputs. You could modify your proposed text by changing up the info so that whatever seemed confidential or private is now differently portrayed. For example, instead of a contract mentioning the Widget Company and John Smith, you change the text to refer to the Specious Company and Jane Capone. An issue here is whether you’ll do a sufficiently exhaustive job such that all of the confidentially and private aspects are fully altered or faked. It would be easy to miss some of the cloudings and leave in stuff that ought to not be there.

Here’s an interesting added twist that might get your noggin further percolating on this topic. If you can completely ensure that none of your input prompts contain any confidential information, does this imply that you don’t need to have an iota of worry about the outputted essays also containing any of your confidential information?

This would seem axiomatically true. No confidential input, no confidential output.

Here’s your mind-bending twist.

Generative AI is often set up to computationally retrain itself from the text prompts that are being provided. Likewise, generative AI is frequently devised to computationally retrain from the outputted essays. All of this retraining is intended to improve the capabilities of generative AI.

I described in one of my other columns the following experiment that I undertook. An attorney was trying to discover a novel means of tackling a legal issue. After an exhaustive look at the legal literature, it seemed that all angles already surfaced were found. Using generative AI, we got the AI app to produce a novelty of a legal approach that had seemingly not before been previously identified. It was believed that nobody else had yet landed on this legal posture. A legal gold nugget, as it were. This could be a strategically valuable competitive legal bonanza that at the right time be leveraged and exploited.

Does that outputted essay constitute a form of confidential information, such that it was generated by the AI for this particular person and contains something special and seemingly unique?

Aha, this leads us to the other allied and intertwined topic about the ownership and IP rights associated with generative AI. Stay tuned to see how this turns out.

A final remark for now.

Sophocles provided this wisdom: “Do nothing secretly; for Time sees and hears all things, and discloses all.” I suppose you could modernize the wording and contend that generative AI and those that devise and maintain the AI are apt to see all too.

It is a modestly token piece of advice worthy of being remembered.

Source: https://www.forbes.com/sites/lanceeliot/2023/01/27/generative-ai-chatgpt-can-disturbingly-gobble-up-your-private-and-confidential-data-forewarns-ai-ethics-and-ai-law/