Keep your eye on the prize, but meanwhile don’t lose sight of other nifty opportunities too.
What am I talking about?
During the famous Gold Rush era, eager prospectors sought the dreamy riches of unearthed gold. Turns out that very few actually struck it rich by discovering those prized gold nuggets. You might be surprised to know that while panning for gold, there was a possibility of finding other precious metals. The erstwhile feverish desire to get gold would sometimes overpower the willingness to mine silver, mercury, and other ores that were readily seen while searching for gold.
The gist is that you might find yourself looking past a less prominent treasure when you are preoccupied with a gleaming goal of seemingly greater value. Is it worth the time and effort to deal with something less alluring? Doing so could distract you from the hoped-for bigger prize. Then, again, can you reach for the stars and aim to have your cake and eat it too?
Failing to exploit riches that are at your feet seems somewhat shortsighted and ought to at least be given a modicum of attention, one would so assume. Putting all your eggs in a single basket, in this instance the yearning to obtain gold, could leave you high and dry. A balanced portfolio might consist of prospecting for whatever valuables can be uncovered.
I have brought up this conundrum due to the somewhat unspoken fact that the advent of generative AI such as ChatGPT has also quietly been facing a similar dilemma. It might be reasonably asserted that generative AI has been mining online gold, as it were, and forsaken the potential afield elements that could be equally valuable and profitable.
It all has to do with data, particularly data mined or scanned from the Internet that is then used principally to data train generative AI apps.
OpenAI’s ChatGPT and its successor GPT-4 would not exist if it were not for all the data training undertaken to get the AI apps into shape for doing Natural Language Processing (NLP) and performing interactive conversations with humans. The data training entailed scanning various portions of the Internet, see my explanation at the link here. In the case of text-to-text or text-to-essay generative AI, the mainstay of ChatGPT, all kinds of text were scanned to ferret out patterns of how humans use words.
I’ll say more about this in a moment.
By having done this massive pattern-matching, the algorithms, and models underlying ChatGPT and other generative AI are able to seemingly converse as though versed in our natural language. In a sense, it is a trick. The trick consists of statistically and mathematically associating words with other words. When doing this at scale, the computational AI can appear as though able to engage in dialogues and produce essays on par with those written by humans (patterned on essays and text that were written by humans).
Without using hyperbole, it is feasible to suggest that ChatGPT has turned those millions upon millions of words that were scanned on the Internet into veritable gold. Some complain that this was done without properly compensating those that had their posted words and essays involuntarily scanned. To them, ChatGPT and other generative AI are altogether infringing on their Intellectual Property (IP) rights and likely committing unabashed plagiarism, see my analysis of these thorny issues at the link here.
One loose-goosey basis for skirting around those AI Law and AI Ethics concerns is that by and large these generative AI apps do not end up word-for-word repeating what they scanned.
While using for example ChatGPT, it is quite rare to have the AI app generate an essay that you can find precisely as posted somewhere on the Internet. There are usually only relatively rare occasions where the generative AI might have stored the exact wording. Most of the time, pattern-matching has crafted in a sense template about words and the combining of words, plus selective randomness is used when composing essays. The outcome is that the essay and interactive conversation is seemingly unique and not a carbon copy of something scanned previously from the Internet.
We are now ready to discuss the twist to all of this.
Prepare yourself accordingly.
Here it is:
- The scanned data that was pattern-matched might encompass other valuable ores or minerals that could likewise be further mined and used in additional ways to garner more money and added profitability.
Right now, the pattern-matching and leveraging of the scanned data of the Internet are focused on trying to get generative AI to work with fluency and wow the socks off of everyone. That is the big-ticket item. It is the gold in them thar hills.
That being said, we can return to the Gold Rush era lessons learned. Perhaps it is shortsighted to overlook the silver, mercury, and other minable aspects that linger there too. Go for the gusto and grab as much value as can be squeezed out of the mined rocks and rubble.
What else could be done with the mined data?
Lots of possibilities arise.
I’ll bucket them into these groupings:
- 1) Generative AI Fluency. This is the customary use of the mined data, namely to enable generative AI such as ChatGPT to be able to combine words in a patterned fashion that resembles that of human writing.
- 2) Marketing Of Generative AI. The mined data could also be used to aid in targeting a sales pitch of sorts toward selling people and businesses on avidly making use of generative AI.
- 3) Deriving Advertising Dollars. The mined data could be used for advertising purposes, including as a service for advertisers that want to reach desirable eyeballs and buyers.
- 4) Devising Profiles Of People. The mined data could be used to computationally craft profiles of people based on the Internet data, doing so across a wide swath of the data and connecting otherwise obscure dots, and then selling those profiles to eager buyers.
- 5) Other Notable Uses. A slew of additional uses is readily viable, given that the data is large-scale, across the board, and has been examined and analyzed via powerful algorithms that can find intricate patterns.
You haven’t especially seen those other uses yet put into play.
Why?
Because the existing gold rush has plenty of gold still left to be utilized and profited from. As mentioned earlier, though a temptation might exist to go beyond the allure of gold, doing so can be distracting. Keep your eye on the prize is the mantra right now for generative AI. Make generative AI better and better. No need to fumble around for leveraging other uses of the mined data.
The equation on this though can potentially wobble and shift at some later point.
If generative AI as a hot item begins to fade, thoughts of finding alternative sources of revenue will undoubtedly arise. The realization that there is silver and other valuable elements that can be mined will surface in the minds of AI executives and AI company shareholders. Find a new means of making dough. Do not get caught if the generative AI mania starts to dribble and maybe spiral out of favor.
AI makers are silently sitting on a silver mine and other golden opportunities. They would be wise for now to stick to their knitting. This doesn’t prevent internal secretive explorations to be performed. Best to be prepared for a rainy day.
A key downside to approaching these alternative means of enrichment is that society might go berserk upon discovery that the scanned Internet data was being used for additional purposes. The perceived value of generative AI seems to have for now kept the angst and anger over the scanned data overreach to be somewhat tepid. It is there and bubbling. We’ll have to wait and see how much the bubbling turns into a full boil.
The odds are that if the AI makers were caught dipping their hands into the cookie jar by using the mined and pattern-matched data for these added purposes, all heck would break loose. Public outrage would ensue. Lawmakers and regulators would be drawn directly into the morass. You could anticipate that the outcry on an AI Ethics basis would be deafening, while the furious energy would also likely be funneled into drafting and enacting new AI Laws restricting such added uses.
The tide has not yet turned toward opening the proverbial Pandora’s box.
Nonetheless, the matter awaits the day that the dreaded container is opened and put into strident use. We should ponder this potentiality, look ahead to its manifestation, and closely examine where things could head.
Vital Background About Generative AI
Before I get further into this topic, I’d like to make sure we are all on the same page overall about what generative AI is and also what ChatGPT and GPT-4 are all about. For my ongoing coverage of generative AI and the latest twists and turns, see the link here.
I’m sure that you already know that ChatGPT is a headline-grabbing AI app that can produce fluent essays and carry on interactive dialogues, almost as though being undertaken by human hands. A person enters a written prompt, ChatGPT responds with a few sentences or an entire essay, and the resulting encounter seems eerily as though another person is chatting with you rather than an AI application. This type of AI is classified as generative AI due to generating or producing its outputs. ChatGPT is a text-to-text generative AI app that takes text as input and produces text as output. I prefer to refer to this as text-to-essay since the outputs are usually of an essay style.
Please know though that this AI and indeed no other AI is currently sentient. Generative AI is based on a complex computational algorithm that has been data trained on text from the Internet and admittedly can do some quite impressive pattern-matching to be able to perform a mathematical mimicry of human wording and natural language. To know more about how ChatGPT works, see my explanation at the link here. If you are interested in the successor to ChatGPT, coined GPT-4, see the discussion at the link here.
There are four primary modes of being able to access or utilize ChatGPT:
- 1) Directly. Direct use of ChatGPT by logging in and using the AI app on the web
- 2) Indirectly. Indirect use of kind-of ChatGPT (actually, GPT-4) as embedded in Microsoft Bing search engine
- 3) App-to-ChatGPT. Use of some other application that connects to ChatGPT via the API (application programming interface)
- 4) ChatGPT-to-App. Now the latest or newest added use entails accessing other applications from within ChatGPT via plugins
The capability of being able to develop your own app and connect it to ChatGPT is quite significant. On top of that capability comes the addition of being able to craft plugins for ChatGPT. The use of plugins means that when people are using ChatGPT, they can potentially invoke your app easily and seamlessly.
I and others are saying that this will give rise to ChatGPT as a platform.
All manner of new apps and existing apps are going to hurriedly connect with ChatGPT. Doing so provides the interactive conversational functionality of ChatGPT. The users of your app will be impressed with the added facility. You will likely get a bevy of new users for your app. Furthermore, if you also provide an approved plugin, this means that anyone using ChatGPT can now make use of your app. This could demonstrably expand your audience of potential users.
As I’ve previously mentioned in my columns, a type of cycle takes place in these circumstances. Sometimes referred to as a network effect, see my analysis at the link here, people tend to join something that others are joining. Facebook was this way. Snapchat was this way. At first, maybe there is little or no traction. But, then, often out of the blue, people start to join. Their friends and colleagues join. Everyone wants to join.
The big get bigger. The small get starved or fail to get any oxygen in the room. That’s the gist of the network effect. It becomes a form of stickiness to the exponential growth factor. People will use what everyone else is using. This in turn makes it more alluring and adds value. The snowball is at times unstoppable and gathers erstwhile momentum.
The temptation to have your app connect with ChatGPT is through the roof. Even if you don’t create an app, you still might be thinking of encouraging your customers or clients to use ChatGPT in conjunction with your everyday services. The problem though is that if they encroach onto banned uses, their own accounts on ChatGPT will also face scrutiny and potentially be locked out by OpenAI.
As noted, generative AI is pre-trained and makes use of a complex mathematical and computational formulation that has been set up by examining patterns in written words and stories across the web. As a result of examining thousands and millions of written passages, the AI can spew out new essays and stories that are a mishmash of what was found. By adding in various probabilistic functionality, the resulting text is pretty much unique in comparison to what has been used in the training set.
There are numerous concerns about generative AI.
One crucial downside is that the essays produced by a generative-based AI app can have various falsehoods embedded, including manifestly untrue facts, facts that are misleadingly portrayed, and apparent facts that are entirely fabricated. Those fabricated aspects are often referred to as a form of AI hallucinations, a catchphrase that I disfavor but lamentedly seems to be gaining popular traction anyway (for my detailed explanation about why this is lousy and unsuitable terminology, see my coverage at the link here).
Another concern is that humans can readily take credit for a generative AI-produced essay, despite not having composed the essay themselves. You might have heard that teachers and schools are quite concerned about the emergence of generative AI apps. Students can potentially use generative AI to write their assigned essays. If a student claims that an essay was written by their own hand, there is little chance of the teacher being able to discern whether it was instead forged by generative AI. For my analysis of this student and teacher confounding facet, see my coverage at the link here and the link here.
There have been some zany outsized claims on social media about Generative AI asserting that this latest version of AI is in fact sentient AI (nope, they are wrong!). Those in AI Ethics and AI Law are notably worried about this burgeoning trend of outstretched claims. You might politely say that some people are overstating what today’s AI can do. They assume that AI has capabilities that we haven’t yet been able to achieve. That’s unfortunate. Worse still, they can allow themselves and others to get into dire situations because of an assumption that the AI will be sentient or human-like in being able to take action.
Do not anthropomorphize AI.
Doing so will get you caught in a sticky and dour reliance trap of expecting the AI to do things it is unable to perform. With that being said, the latest in generative AI is relatively impressive for what it can do. Be aware though that there are significant limitations that you ought to continually keep in mind when using any generative AI app.
One final forewarning for now.
Whatever you see or read in a generative AI response that seems to be conveyed as purely factual (dates, places, people, etc.), make sure to remain skeptical and be willing to double-check what you see.
Yes, dates can be concocted, places can be made up, and elements that we usually expect to be above reproach are all subject to suspicions. Do not believe what you read and keep a skeptical eye when examining any generative AI essays or outputs. If a generative AI app tells you that President Abraham Lincoln flew around the country in a private jet, you would undoubtedly know that this is malarky. Unfortunately, some people might not realize that jets weren’t around in his day, or they might know but fail to notice that the essay makes this brazen and outrageously false claim.
A strong dose of healthy skepticism and a persistent mindset of disbelief will be your best asset when using generative AI.
Into all of this comes a slew of AI Ethics and AI Law considerations.
There are ongoing efforts to imbue Ethical AI principles into the development and fielding of AI apps. A growing contingent of concerned and erstwhile AI ethicists are trying to ensure that efforts to devise and adopt AI takes into account a view of doing AI For Good and averting AI For Bad. Likewise, there are proposed new AI laws that are being bandied around as potential solutions to keep AI endeavors from going amok on human rights and the like. For my ongoing and extensive coverage of AI Ethics and AI Law, see the link here and the link here, just to name a few.
The development and promulgation of Ethical AI precepts are being pursued to hopefully prevent society from falling into a myriad of AI-inducing traps. For my coverage of the UN AI Ethics principles as devised and supported by nearly 200 countries via the efforts of UNESCO, see the link here. In a similar vein, new AI laws are being explored to try and keep AI on an even keel. One of the latest takes consists of a set of proposed AI Bill of Rights that the U.S. White House recently released to identify human rights in an age of AI, see the link here. It takes a village to keep AI and AI developers on a rightful path and deter the purposeful or accidental underhanded efforts that might undercut society.
I’ll be interweaving AI Ethics and AI Law related considerations into this discussion.
More Ways Than One To Make A Buck
We are ready to further unpack this thorny matter.
As stated, there is at the present time more so an incentive to stay focused on the gold of the data training and steer afield of the other precious metals that might also be mined. The situation might change. If generative AI wanes in interest, the AI makers could be under pressure to discover additional sources of revenue. Even if generative AI does not diminish, one might always be tempted to reach for more revenue and ergo inch into leveraging the data training for added bucks.
The counterbalancing effect is that utilizing the data training for other purposes could trigger a public and lawmaking avalanche. Upon the realization that one or more AI makers are going beyond the scope of leveraging the data training solely for generative AI purposes, a backlash is likely to ensue. AI makers would indubitably be seen as insidiously greedy and overstepping the public largesse.
Here’s a question that I often get about this weighty topic:
- Would other uses of the data training potentially get the AI makers into legal trouble?
It depends.
The added uses that I’ve listed earlier are generally amenable to legal uses if done carefully. An AI maker that fails to mindfully mine the data training in bounded ways will begin to get a bit too close to the sun and could feel the heat accordingly. There is also reputational damage that can occur, which looms over their heads regardless of the legalities per se.
Given that the gold rush for generative AI is in full swing, prudent AI makers are apt to stay as far afield of the data training exploitation as is feasible. No need to get marred with any muddy accusations or have their toes residing in the murky waters.
Indeed, the better path would be to adroitly declare what your intentions are.
Consider for example the recent indication regarding ChatGPT and GPT-4 as stated by OpenAI in their formal posting entitled “Our approach to AI safety” of April 5, 2023:
- “Our large language models are trained on a broad corpus of text that includes publicly available, licensed content, and content generated by human reviewers. We don’t use data for selling our services, advertising, or building profiles of people—we use data to make our models more helpful for people. ChatGPT, for instance, improves by further training on the conversations people have with it.”
Note that this is an explicit indication that they aren’t using the data for other purposes. Some but not all of the generative AI makers have similar posted indications.
For those that don’t post such a declaration, it isn’t apparent as to whether this lack of pronouncement is due to being unaware of the importance of stating their posture, or maybe due to opting to explore their mining options and not yet laying their cards on the table. Until or unless enough public realization of the matter floats above a given threshold, there aren’t overt pressures to make a stand that is necessarily openly stated.
An ongoing question about whether new AI-related laws are needed to rein in the AI makers is being bandied about. I have discussed various proposed new AI laws such as those proposed by the EU and those in the US, see my coverage at the link here.
One clarification about the interest in devising and enacting new AI Laws is that we do already have numerous laws that encompass facets of AI. In other words, those zany headlines that sometimes suggest that we are in a lawless condition with respect to laws that cover AI are rather farfetched. Conventional laws do apply to AI and AI makers, see for example my discussion about the FTC and AI considerations, at the link here. Of course, there is also room for improvement such that new AI Laws can be customized to deal with legal wrinkles and address new legal issues that are arising from advancements in AI.
A recent talk by FTC Commissioner Alvaro Bedoya entitled “Early Thoughts on Generative AI” that was posted online proffered these insights about the fact that there are existing laws and regulations pertaining to today’s AI:
- “The reality is, AI is regulated. Just a few examples:”
- “Unfair and deceptive trade practices laws apply to AI. At the FTC our core section 5 jurisdiction extends to companies making, selling, or using AI. If a company makes a deceptive claim using (or about) AI, that company can be held accountable. If a company injures consumers in a way that satisfies our test for unfairness when using or releasing AI, that company can be held accountable.”
- “Civil rights laws apply to AI. If you’re a creditor, look to the Equal Credit Opportunity Act. If you’re an employer, look to Title VII of the Civil Rights Act. If you’re a housing provider, look to the Fair Housing Act.”
- “Tort and product liability laws apply to AI. There is no AI carve-out to product liability statutes, nor is there an AI carve-out to common law causes of action.”
- “AI is regulated. Do I support stronger statutory protections? Absolutely. But AI does not, today, exist in a law-free environment” (source: online posting “Prepared Remarks of Commissioner Alvaro M. Bedoya, Federal Trade Commission”, “Before the International Association of Privacy Professionals”, April 5, 2023).
Any AI maker contemplating the use of data training for purposes beyond their generative AI would be wise to consult their legal counsel. On top of that, AI makers should already be seeking the advice of their legal counsel about the Intellectual Property (IP) infringement and plagiarism allegations that are already being voiced and will inexorably grow louder and louder.
Only a few lawsuits are underway now. The expectation is that if some of those prevail, the floodgates will be opened and AI makers will have their hands full with trying to stay above water.
Conclusion
There doesn’t seem to be an appetite as yet for the generative AI makers to try and turn toward their data training for additional revenue possibilities. Using the data training for generative AI seems to be sufficient for now. Stick to your knitting. Proceed to mine for gold when gold is still highly prized.
At some juncture, the temptation to leverage data training for other avenues is going to arise. The question will be whether the promise is alluring and altogether golden.
A final comment for now.
You might be aware that the exclamatory “Eureka!” is associated with the discovery of gold. It is an expression said to be uttered by the famed Archimedes and be indicative of the excitement associated with a discovery or new invention (including the purity of gold), namely that “I have found it.”
AI makers and others that seek to leverage data training to reach beyond generative AI might believe they are having a eureka moment, but they should be forewarned that they could be deluding themselves into pursuing fool’s gold.
Source: https://www.forbes.com/sites/lanceeliot/2023/04/24/internet-training-data-of-chatgpt-can-be-used-for-non-allied-purposes-including-privacy-intrusions-frets-ai-ethics-and-ai-law/