Recent ChatGPT And Bard Predicament Raises Thorny Questions About Whether Using One AI To Train A Competing AI Can Be Fair And Square, Says AI Ethics And AI Law

Looking closely at the recently reported qualms of using a prominent generative AI app for the data … [+] training of other competing generative AI apps.

getty

They say that all is fair in love and war.

Maybe that’s true, maybe not.

One thing that we know for sure is that there is a kind of battle or war taking place among the various Artificial Intelligence (AI) makers that are hurriedly and desperately aiming to bring their generative AI apps into the marketplace. Whereas in the past there was a more tepid sense of a rush, nowadays the ardent and overpowering push is on to get as quickly into the public sphere with generative AI apps as humanly feasible.

Into this frothy footrace comes potential allegations of trying to use a competing generative AI app to bolster and bootstrap another one. You might liken this to a sports team that sneakily observes a crosstown rival and uses what they glean to furtively advance their own endeavors. Imagine those sports spies that peer over tall fences and take copious notes of what they scan. Those notes get taken back to headquarters and perhaps become infused into the strategies and tactics of the spying team.

Is that fair?

In today’s column, I address this rather thorny question and do so by citing a claimed or contended example of perhaps that very kind of competitive reconnaissance within the techie field of AI. The mainstream news media and social media recently featured a contention about ChatGPT, the widely and wildly popular generative AI app by AI maker OpenAI, which was allegedly used to aid in the data training of another emerging generative AI app, specifically, the one devised by Google and known as Bard.

Note that I said allegedly.

All manner of back and forth in the news asserted boldly that Google Bard supposedly underwent data training via using the outputs of ChatGPT, meanwhile, a stated denial by Google that this happened was being voiced and seemingly contradicted those stinging and quite disconcerting claims (let’s also agree that if those claims were indeed false, it is stridently sad and dismaying that the contentions got such unfettered and momentary prominence in the news).

Seems like there is relishing for gossip, even in the AI industry, and as headline-making to the public at large due to the recent mania about generative AI.

I am going to briefly bring you up to speed on the contentious matter, but I don’t want to dwell on the momentary specific contended instance. Instead, I am going to seek to broaden the discussion and look at a bigger picture regarding the overarching notion of generative AI apps being potentially data trained by each other.

Let’s not get mired in a particular morass. It is useful to alter our gaze and see the forest for the trees. If there is any lesson to be learned from the titillating affair, the gist is that weighing the pros and cons of competing generative AI being used to advance the other is a topic worthy of devoted consideration overall.

Into all of this comes a slew of AI Ethics and AI Law considerations.

There are ongoing efforts to imbue Ethical AI principles into the development and fielding of AI apps. A growing contingent of concerned and erstwhile AI ethicists are trying to ensure that efforts to devise and adopt AI takes into account a view of doing AI For Good and averting AI For Bad. Likewise, there are proposed new AI laws that are being bandied around as potential solutions to keep AI endeavors from going amok on human rights and the like. For my ongoing and extensive coverage of AI Ethics and AI Law, see the link here and the link here, just to name a few.

The development and promulgation of Ethical AI precepts are being pursued to hopefully prevent society from falling into a myriad of AI-inducing traps. For my coverage of the UN AI Ethics principles as devised and supported by nearly 200 countries via the efforts of UNESCO, see the link here. In a similar vein, new AI laws are being explored to try and keep AI on an even keel. One of the latest takes consists of a set of proposed AI Bill of Rights that the U.S. White House recently released to identify human rights in an age of AI, see the link here. It takes a village to keep AI and AI developers on a rightful path and deter the purposeful or accidental underhanded efforts that might undercut society.

I’ll be interweaving AI Ethics and AI Law related considerations into this discussion.

Getting Into The Weeds

Let’s start at the beginning of things.

For those of you unfamiliar with this hottest and latest AI app, ChatGPT is a headline-grabber that is extensively known for being able to produce fluent essays and carry on interactive dialogues, almost as though being undertaken by human hands. A person enters a written prompt, ChatGPT responds with a few sentences or an entire essay, and the resulting encounter seems eerily as though another person is chatting with you rather than an AI application. This is referred to as generative AI since it generates text or essays in response to text-entered prompts. ChatGPT is made by the firm OpenAI, a company that has become the darling of the AI industry and garners all manner of avid attention these days.

To get more details about how ChatGPT works, see my explanation at the link here. If you are interested in the successor to ChatGPT, coined GPT-4, see the discussion at the link here.

Generative AI is based on a complex computational algorithm that has been data trained on text from the Internet and admittedly can do some quite impressive pattern-matching to be able to perform a mathematical mimicry of human wording and natural language. Please realize that ChatGPT is not sentient. We don’t have sentient AI. Do not fall for those zany headlines and social media rantings suggesting otherwise.

OpenAI released ChatGPT in November of last year. At the time, most of the rest of the AI field figured that ChatGPT was going to be just another instance of a generative AI app being made available to the public. You might not realize that prior efforts to roll out generative AI by various AI makers had been met with all manner of consternation. The releases were usually to a narrow audience of selected alpha and beta users. Those were likely AI insiders.

The AI insiders tended to right away poke at the generative AI and sought to get the AI to do unsavory things. This included getting the AI to generate essays that were filled with foul language and that exhibited seemingly untoward biases. Even the instance of allowing the general public to get access tended to produce similar results. The firestorm against those generative AI apps was quick and fierce. By and large, the respective AI maker would scurry to take the generative AI off the market and apologize for the ostensibly premature or early release.

It was generally assumed that the same fate would await the release of ChatGPT, especially since it was being made publicly available and not just accessible by AI researchers alone. The public often loves to shake up those AI newbie releases.

Lo and behold, to the shock of everyone, including seemingly OpenAI, the ChatGPT generative AI app became the darling of the AI world. Well, actually, ChatGPT become the national and international darling of the world all told. People seemed to shrug off the issues such as generating errors, having biases, producing falsehoods, and making up stuff that is coined as a so-called AI hallucination (I do not favor the catchphrase of “AI hallucination” and have explained why it is overly anthropomorphizing of AI, see the link here, but gloomily the terminology seems to have become inured in our culture anyway).

Yikes said the competing AI makers, if ChatGPT was able to take the world by storm, they had better dust off their generative AI and get it into the marketplace too. Immediately. Right away. With ardent haste.

There is a bit of a twist that is worth mentioning at this juncture.

One especially notable aspect that OpenAI had done, seemingly relatively successfully, was that they had sought to use various techniques to beforehand data train ChatGPT to avoid generating unsavory outputs that have befouled others. For example, OpenAI extensively used RLHF (reinforcement learning via human feedback), whereby they hired human reviewers to examine the early days of ChatGPT outputs and indicate what was proper and what was improper. The computational pattern matching within ChatGPT was able to be tuned to a degree that much of the otherwise widespread insidious outputs would be caught before being emitted (this is not ironclad but does provide a modicum of safeguarding).

Envision then that you are a competing AI maker.

You eagerly want to get your generative AI into the field. Your shareholders are perhaps berating you for having let ChatGPT rise to solo prominence. There you are, housing a comparable generative AI app, and yet the world doesn’t realize that you have the precious goods in hand. Darn it, put the AI app out there. Do not let ChatGPT take all the oxygen out of the room. OpenAI and ChatGPT were getting way more than a considered five minutes of fame. It was time to grab some of the fame for other competing generative AI apps.

A big hitch is that if you release your generative AI and it burps and unleashes a torrent of ugly and unsavory outputs, you are going to be toast. Here’s what I mean. The moment you put your generative AI into public view, loyalists to ChatGPT and other cynics will fervently seek to undermine your generative AI. They want ChatGPT to be the winner. They don’t want competitors to usurp the winning streak of ChatGPT.

You are going to enter into a brutal gauntlet and could end up sorely bruised and battered, more so than even if in the quieter days you had released your generative AI. Now, the spotlight was shining bright. The generative AI that you release has got to be ideal. It has to not just be as good as ChatGPT, it has to somehow surpass an imaginary threshold of being able to walk on water.

If you’ve been following the news about AI over the last several months, you likely have noticed that most other generative AI apps are being torn to shreds by critics. Meanwhile, as though living in the clouds above, ChatGPT seems to pretty much stand above the fray. It is a dream come true for OpenAI. It is a dream that all competing AI makers probably have each and every day and night of their daily struggles.

Anyway, this takes us to the release by Google of their generative AI app known as Bard. I’ve covered the details about Bard and related facets in my coverage at the link here. Just know that Bard has been bumping along and getting the usual treatment of handwringing and sharp tongues that arise for all things other than ChatGPT.

Into this milieu comes the accusations made recently that Bard was supposedly partially data trained via outputs from ChatGPT.

I’ll unpack that for you.

Throwing Stones At The Bard

Various reporting has indicated that allegedly the Bard was data trained to some extent via using ChatGPT-generated written interactions that had been posted on a website called ShareGPT (this website contains human postings of their ChatGPT conversations, numbering around 120,000 or so such posted conversations at this time).

According to an online posted piece by The Verge, a Google spokesperson purportedly said this: “Bard is not trained on any data from ShareGPT or ChatGPT” (per article entitled “Google Denies Bard Was Trained With ChatGPT Data”, Sean Hollister, March 29, 2023).

Despite that seemingly clearcut denial, some have wondered aloud whether there might be some trickery involved in that answer. For example, one conjecture is that perhaps at one point the ChatGPT conversations had been used, but then later were retracted somehow (note that if so, this is a lot harder than just waving a magic wand). Another speculative idea is that maybe some other site that stores ChatGPT conversations was used, thus apparently allowing a denial of having used ShareGPT or ChatGPT per se. And so on.

That is a lot of word stretching noodling, for sure.

Let’s put aside that matter and focus our attention on the bigger picture.

First, if a competing generative AI was going to be potentially bolstered or boosted by attempting to reuse or leverage another generative AI, there are several means that this could be undertaken. Some of the approaches would seemingly veer into possible legal troubles, while others might not particularly raise any legal bells and yet nonetheless could ardently be construed as violating Ethical AI precepts (known as “soft law” principles or guidelines, see my coverage at the link here).

Consider these potential approaches that might be used to leverage a competing generative AI app:

1) Copy. Outright copy of the underlying generative AI model and its associated data
2) Reverse Engineer. Reverse engineer the underlying generative AI model and its associated data
3) Garner Insights. Use the competing generative AI and observe its outputs to garner insights
4) Direct Use. Make direct use of the competing generative AI to train your generative AI app
5) Record Outputs. Record the competing generative AI outputs and feed those as training for your generative AI
6) Leverage Recorded Outputs. Find recorded competing generative AI outputs and feed those as training for your generative AI
7) Other

I’ll briefly cover each of those possibilities.

Copying A Generative AI App

In the case of copying a competing generative AI model and its associated data, such an act is likely to invoke all manner of legal consternations (unless it is open source or otherwise offered to all comers in some unrestricted fashion).

The originating AI maker could contend that copying their secret sauce is a violation of the Intellectual Property (IP) rights associated with their wares. One would also imagine that the chances of getting caught are somewhat high since presumably, the resulting competing generative AI would be eerily similar to the sourced version in terms of its interactions and outputs. This could be discovered and uncovered potentially quickly. The odds are too that any competing AI maker that got caught like this would be doomed by public condemnation, in addition to the strident legal ramifications.

Reverse Engineer A Generative AI App

Another approach would consist of attempting to reverse engineer a competing generative AI. Once again, this could give rise to legal issues. The other angle is that if word spread that an AI maker had reversed-engineered another competing AI app, it would certainly suggest that the copying AI maker is underhanded and weak in that they had to resort to such tomfoolery.

Not a good look.

Garner Insights By Using A Generative AI App

The third bulleted point above involves garnering insights by using a competing generative AI app.

Here’s how that might play out. An AI developer of a competing AI maker decides to get everyday access to a competing generative AI app, acting as though they are part of the normal everyday public users of the AI app. They proceed to use it just as would anyone else. However, they are carefully examining the competing generative AI and assessing how it reacts to a slew of prompts. This might provide insights about how to devise their generative AI app.

This would not appear to raise any notable legal conundrums, though there are possibilities such as some might contend that this could come under anti-trust or other legally peripheral concerns. The approach might be likened to car makers that buy a competing car and drive it around their test track. They do so to see what the competition can do. It might then be compared to their own products and possibly spur new ideas of additional features or future changes to their automobiles.

One argument in favor of this kind of effort is that perhaps it aids the growth of functionality and features for consumers and others that might be beneficial. In essence, if competing firms see capabilities that people seem to want, those features might gradually become the norm rather than seen as costly outliers. A contention is that this is good for the marketplace all told (not everyone would agree, but you get the idea).

Direct Use To Train Another Generative AI App

In this use case, a competing generative AI is trained by wiring up some other generative AI app and having the two generative AI systems interact directly with each other. Perhaps the one being used to undertake the training is static and not being actively trained any further. Meanwhile, the competing generative AI is in the midst of being trained.

Let’s refer to them as a generative AI app coined Delta that is being trained, and the other generative AI is named Omega. Delta provides prompts to Omega. Omega responds, which Delta uses to train itself. This continues for perhaps thousands or even millions of runs. After a while, Delta might even be set up to have Omega send prompts to it. The Delta response could be fed into Omega and then ask Omega to assess those responses. Round and round this goes.

You could try to do this, but it is hairy if done surreptitiously.

We can assume that Omega is a generative AI that costs money to access and use. All of this voluminous usage is going to rack up quite a bill. Maybe the AI maker of Omega would welcome this hefty coinage. On the other hand, the AI maker of Omega would almost certainly figure out that something is afoot. The tremendous number of cycles would stick out like a sore thumb.

You might not know that in the specific case of trying to use ChatGPT for such a purpose, the OpenAI licensing says that you cannot do this. There is a declarative statement that indicates you aren’t allowed to use ChatGPT to develop AI models that compete with OpenAI.

In particular, take a look at Section 2c of the Usage Requirements for ChatGPT as posted online by OpenAI and especially subsection iii:

“(c) Restrictions. You may not (i) use the Services in a way that infringes, misappropriates or violates any person’s rights; (ii) reverse assemble, reverse compile, decompile, translate or otherwise attempt to discover the source code or underlying components of models, algorithms, and systems of the Services (except to the extent such restrictions are contrary to applicable law); (iii) use output from the Services to develop models that compete with OpenAI; (iv) except as permitted through the API, use any automated or programmatic method to extract data or output from the Services, including scraping, web harvesting, or web data extraction; (v) represent that output from the Services was human generated when it is not or otherwise violate our Usage Policies; (vii) buy, sell, or transfer API keys without our prior consent; or (viii) if you are using the API in connection with a website or application directed at children, send us any personal information of children under 13 or the applicable age of digital consent. You will comply with any rate limits and other requirements in our documentation. You may use Services only in geographies currently supported by OpenAI.”

Record Outputs And Use Those Recordings To Train A Generative AI App

Yet another possible approach consists of using a generative AI app to produce a plethora of outputs that are then recorded for playback. So, you just get the competing generative AI app to do tons and tons of outputs which are fed into a series of files. Those files are subsequently used as training data for your generative AI.

The difference between this approach and the one above that wires up two generative AI apps is that you can use the recorded outputs at your leisure. The two generative AI apps aren’t working in real-time on a head-to-head basis. Instead, you record the outputs of the competing generative AI and later on feed those into your generative AI.

We are still faced with the issue of how much of this will you be able to collect. Undoubtedly, this would rack up a hefty bill as you use the competing AI to produce the voluminous outputs. If you only did a dribble worth of outputs, this would likely be insufficient as a data training set. You might as well not try touching the third rail if you are only getting a marginal amount of data for training use.

The chances are too that the competing generative AI maker has a licensing clause that states you aren’t to do this.

Leverage Already Recorded Outputs To Train A Generative AI App

I’ll cover this as the last of the approaches that I’ve identified (there are other approaches possible too).

You could try to find recorded outputs from a competing generative AI app that has perchance been posted on the Internet. In this instance, you would be able to point out that you did not directly use the competing generative AI app. Other people did so. Those people posted their generated outputs. All you did was happen to scan those posted outputs and then used those postings to aid in data training of your generative AI.

Whether this violates a licensing clause of the competing generative AI maker is somewhat cloudy. You would have the upper hand that you in fact did not use the competing AI app. You never logged in. You never recorded any outputs. You never hooked up your generative AI to the competing generative AI.

If an accusation is made that you skirted around the spirit of the clause, this too can be counterargued. For example, suppose that you were using your everyday text scanner to find text on the Internet that could be used to train your generative AI. The scanner just so happened to come across a bunch of handy text. The scanner had no viable means to detect that the text was recorded outputs from a competing generative AI.

For all intent and purposes, the posted text was amidst the massive amount of posted text across and throughout the Internet. This brings up an aside related to AI Law. I’ve covered that some insist that the scanning of text from the Internet for the training of generative AI is a violation of Intellectual Property (IP) rights and likely leads to plagiarism too, see my analysis at the link here. Thus, this brings up a kind of irony. The competing generative AI might have possibly trampled on the IP rights of those that posted their text stories and narratives, while another generative AI comes along and does something similar to the text generated by a competing generative AI.

Makes your head spin.

Anyway, one question is whether the volume of the competing generative AI recorded outputs will do much good. If the volume is relatively low, going out of your way to find it and use it would seem a bit of a miscalculation to do purposefully. Not worth the headache.

Conclusion

A few final remarks for now on this thorny topic.

One viewpoint is that for us as humankind to reach Artificial General Intelligence (AGI), the vaunted AI that would be sentient or human-like, we will need to cobble together many other narrower AI systems, see my discussion on this at the link here.

If you believe that attaining AGI is for the betterment of humanity, perhaps we do want generative AI apps to essentially help bootstrap other generative AI apps. The more the merrier. We ought to be encouraging the AI makers to share and share alike. Lawmakers and regulators might want to devise new AI laws that allow for this sharing arrangement to occur. It could even be set up as a requirement rather than of a voluntary nature.

Plenty of counterarguments arise.

For example, perhaps this might unduly scurry us toward AGI, and the AGI will catch us off-guard. I’ve examined the proclaimed existential risks of AGI at the link here. Critics would tend to argue that we should not grease the skids toward AGI. We need sufficient time and a proper pace to ready the world for AGI. Another qualm is whether it is fair to AI makers to be willing to hand over their hard-earned AI accomplishments, for which they ought to be compensated for their wares. Etc.

Here’s a last twist for you.

Some have fretted that generative AI is going to fill up the Internet. The idea is straightforward. People will run generative AI and post the outputs onto the Internet. This can also be done in an automated fashion. Gradually, the Internet will become dominated by generative AI-produced text and other modes of output. We won’t be able to discern what humans devised versus done by AI. For my analysis of the generative AI clogging up the Internet supposition, see the link here. For being able or not able to discern human-devised outputs from AI ones, see the link here.

The overall conundrum is that the Internet might be nearly entirely composed of generative AI outputs. In that case, any attempts to train generative AI on the text or other artifacts on the Internet will have little choice but to be using other generative AI outputs. They can’t avoid them particularly. You can’t discern which is which, plus the preponderance is generative AI outputs.

Food for thought.

Speaking of heavy thoughts, purportedly Plato said this about Socrates: “There you have Socrates’ wisdom; he himself isn’t willing to teach, but he goes around learning from others and isn’t even grateful to them.”

Those that opt to use other generative AI, if indeed they do so, as a means to advance their generative AI should at least be openly admitting to having done such endeavors. The icing on the cake would be a hearty thanks and an appreciative pat on the back (or more).

Source: https://www.forbes.com/sites/lanceeliot/2023/04/06/is-it-wrong-to-use-chatgpt-as-a-means-to-train-competing-generative-ai-apps-asks-ai-ethics-and-ai-law/