The famous illusionist and acclaimed escape artist Harry Houdini once boldly declared this rather unabashed assertion: “No prison can hold me; no hand or leg irons or steel locks can shackle me. No ropes or chains can keep me from my freedom.”
That seems like a pretty tall order. We would all generally agree that it is possible to imprison someone such that they are unable to escape. Human strength and ingenuity can only go so far when it comes to being placed into strictly devised confinement. If a prison or jail is stridently constructed with the idea of being escape-proof, it would seem that no person can overcome such all-encompassing constraints.
Of course, throughout history, there have been notable cases of escapes from otherwise assumed impossible to get out of imprisonments. Going all the way back to the year 1244, a prisoner in the infamous Tower of London managed to craft a makeshift rope via the use of tattered bedsheets. He was able to partially escape by climbing down the flimsy cord. Turns out that the rope snapped amid his endeavor and he immediately thereupon fell to his death.
Would you be willing to say that he did escape?
On the one hand, sure, he managed to get outside of the confining room within the Tower of London. But it ostensibly seems like not much of a successful escape given that he died in the act of performing the breakout. I dare say that we would be unduly generous in calling this an escape per se.
You might vaguely be familiar with the prison escapee William Sutton aka “Slick Willie” that was a notorious bank robber in the 1930s and 1940s. He managed to get on the FBI’s Ten Most Wanted Fugitives list. During his various incarcerations, he found a means to escape several times. In one case, he dressed up as a prison guard and got out of the Philadelphia County Prison. Perhaps the more dramatic instance was when he and a dozen other fellow convicts used a tunnel to break out of the Eastern State Penitentiary.
I believe we would all concur that he did in fact make several genuine escapes. Free and clear. Ultimately, he was later apprehended. This somewhat dampens the efficacy of the escapes though it does not undercut the undeniable fact that he did indeed manage to escape. Note that not every one of his attempts led to an escape.
A third illustrative example regarding escapes is the well-known circumstance involving the maximum-security prison Alcatraz or simply called “The Rock” which was presumed to be inescapable confinement residing in the middle of San Francisco Bay. This highly fortified prison had numerous carefully placed guard towers, it had extremely meticulous rules about what the prisoners could do and not do, and the overall semblance of being secure was heightened by the aspect that the rough and unforgiving cold waters of the Pacific Ocean surrounded this fortress-like confinement.
On June 12, 1962, a surprising and history-making escape was discovered. Three prisoners were not in their designated cells. Fake dummy heads rested on their respective pillows, fooling the guards that throughout the night patrolled the hallways in front of the cells. As far as we know, the prisoners had created a means to use a prison ventilator shaft to get up to the prison roof, they then climbed down and got over a fence, so that they could reach the edge of the island. They then apparently launched a raft that they had crudely fashioned from raincoats.
Their whereabouts are still unknown. They might have died during the watery journey. They might have lived and made it to shore and freedom. They nor their bodies were ever found. The FBI closed the case in 1979 and handed it over to the U.S. Marshalls Service. One supposes that we will never really know what the outcome was.
What do all of these sagas about escaping from confinement tell us?
It seems relatively clear-cut that:
- Sometimes an escape is not possible
- Sometimes an escape is possible but falls apart during the escape attempt
- Sometimes an escape is possible but is short-lived
- Sometimes an escape is possible and seemingly everlasting
I bring up this intriguing topic because of something that is seriously being bandied around in the field of Artificial Intelligence (AI). There has been a longstanding question about whether AI can be confined or imprisoned to the degree that the AI is unable to escape or get loose from the said confinement.
This is commonly referred to as the AI Confinement Problem (AICP).
Insiders usually say AICP to other insiders, doing so with a wink-wink acknowledgment of the insider acronym. Another shortened lingo is to just utter the word “confinement” or the word “containment” to bring up the subject matter. Choose whichever you wish.
The crux of the topic is the earnest and heartfelt belief that we might need to confine AI, though this simultaneously raises the thorny question of whether we can realistically devise confinement that will be truly confining and inescapable. Not just in theory but actual day-to-day practice. AI is not necessarily a pushover. Maybe the AI can find a means to break out, bust out, do a jailbreak, fly the coop, or otherwise wiggle or electronically beam out of the imprisonment. This is a serious and lamentedly open-ended issue that AI Ethics and Ethical AI continues to struggle with, see my ongoing and extensive coverage of AI Ethics and Ethical AI at the link here and the link here, just to name a few.
Houdini said that no prison could hold him and that no shackles can shackle him.
Perhaps AI can make the same audacious claim, doing so without any hyperbole or overstatement.
Time to unpack this.
You might be tempted to readily believe that AI can be an escape artist whilst believing that humans are less likely to be able to escape from a rigorously divined state of imprisonment. Humans are human. Creating confinement for humans generally ought to be straightforward. The trick undoubtedly is that keeping the human alive during their imprisonment means that something must be arranged to allow for providing food, enabling health-related care, and the likes associated with a functioning human body. Those details are bound to leave open ends and chances for finding a means to escape from confinement.
An AI system would not presumably require those same humanitarian provisions (well, as you’ll see in a moment, it depends upon whether we are considering sentient AI and the parameters associated with legal personhood). If an AI is a robot, we could just toss the contraption into a special escape-proof cell and not ever come back to see its rusting parts. The deed is done. No worries about it physically being able to escape.
The AI though might principally be software and ergo run on all manner of computer systems. In that case, imprisonment becomes a bit more challenging. Assuming that we can somehow round up all copies, we might be able to place the offending AI into a devoted computer that we have specially crafted to be imprisonment for the AI. This special-purpose computer acts as a type of AI confinement citadel. Maybe it houses just one particular AI or could be cleverly instituted to be an AI holding tank for all manner of AI systems (imagine something like the elaborate entrapment system employed in the movie Ghostbusters, just as an illustration of this admittedly somewhat farfetched idea).
Before I get into the details of the AI Confinement Problem, it is worthwhile to envision the realm of AI as consisting of two conditions or possibilities. I am talking about the distinction between AI that is sentient and AI that is not sentient. We need to make sure that we are on the same page about these distinctions to further sensibly discuss the AI Confinement matter.
I proffer next a stark and unabashed remark that you might find either shocking or altogether obvious and mundane.
There isn’t any AI today that is sentient.
We don’t have sentient AI. We don’t know if sentient AI will be possible. Nobody can aptly predict whether we will attain sentient AI, nor whether sentient AI will somehow miraculously spontaneously arise in a form of computational cognitive supernova (usually referred to as the singularity, see my coverage at the link here). To those of you that are seriously immersed in the AI field, none of this foregoing pronouncement is surprising or raises any eyebrows. Meanwhile, there are outsized headlines and excessive embellishment that might confound people into assuming that we either do have sentient AI or that we are on the looming cusp of having sentient AI any coming day.
Please realize that today’s AI is not able to “think” in any fashion on par with human thinking. When you interact with Alexa or Siri, the conversational capacities might seem akin to human capacities, but the reality is that it is computational and lacks human cognition. The latest era of AI has made extensive use of Machine Learning (ML) and Deep Learning (DL), which leverage computational pattern matching. This has led to AI systems that have the appearance of human-like proclivities. Meanwhile, there isn’t any AI today that has a semblance of common sense and nor has any of the cognitive wonderment of robust human thinking.
ML/DL is merely a form of computational pattern matching. The usual approach is that you assemble data about a decision-making task. You feed the data into the ML/DL computer models. Those models seek to find mathematical patterns. After finding such patterns, if so found, the AI system then will use those patterns when encountering new data. Upon the presentation of new data, the patterns based on the “old” or historical data are applied to render a current decision.
AI and especially the widespread advent of ML/DL has gotten societal dander up about the ethical underpinnings of how AI might be sourly devised. You might be aware that when this latest era of AI got underway there was a huge burst of enthusiasm for what some now call AI For Good. Unfortunately, on the heels of that gushing excitement, we began to witness AI For Bad. For example, various AI-based facial recognition systems have been revealed as containing racial biases and gender biases, which I’ve discussed at the link here.
Efforts to fight back against AI For Bad are actively underway. Besides vociferous legal pursuits of reining in the wrongdoing, there is also a substantive push toward embracing AI Ethics to righten the AI vileness. The notion is that we ought to adopt and endorse key Ethical AI principles for the development and fielding of AI doing so to undercut the AI For Bad and simultaneously heralding and promoting the preferable AI For Good.
How does this tend to arise in the case of using Machine Learning?
Well, straightforwardly, if humans have historically been making patterned decisions incorporating untoward biases, the odds are that the data used to “train” ML/DL reflects this in subtle but significant ways. Machine Learning or Deep Learning computational pattern matching will blindly try to mathematically mimic the data accordingly. There is no semblance of common sense or other sentient aspects of AI-crafted modeling per se.
Furthermore, the AI developers might not realize what is going on either. The arcane mathematics in the ML/DL might make it difficult to ferret out the now hidden biases. You would rightfully hope and expect that the AI developers would test for the potentially buried biases, though this is trickier than it might seem. A solid chance exists that even with relatively extensive testing that there will be biases still embedded within the pattern matching models of the ML/DL.
You could somewhat use the famous or infamous adage of garbage-in garbage-out (GIGO). The thing is, this is more akin to biases-in that insidiously get infused as biases submerged within the AI. The algorithm decision-making (ADM) of AI axiomatically becomes laden with inequities.
Not good.
This is also why the tenets of AI Ethics have been emerging as an essential cornerstone for those that are crafting, fielding, or using AI. We ought to expect AI makers to embrace AI Ethics and seek to produce Ethical AI. Likewise, society should be on the watch that any AI unleashed or promogulated into use is abiding by AI Ethics precepts.
To help illustrate the AI Ethics precepts, consider the set as stated by the Vatican in the Rome Call For AI Ethics and that I’ve covered in-depth at the link here. This articulates six primary AI ethics principles:
- Transparency: In principle, AI systems must be explainable
- Inclusion: The needs of all human beings must be taken into consideration so that everyone can benefit, and all individuals can be offered the best possible conditions to express themselves and develop
- Responsibility: Those who design and deploy the use of AI must proceed with responsibility and transparency
- Impartiality: Do not create or act according to bias, thus safeguarding fairness and human dignity
- Reliability: AI systems must be able to work reliably
- Security and privacy: AI systems must work securely and respect the privacy of users.
As stated by the U.S. Department of Defense (DoD) in their Ethical Principles For The Use Of Artificial Intelligence and as I’ve covered in-depth at the link here, these are their six primary AI ethics principles:
- Responsible: DoD personnel will exercise appropriate levels of judgment and care while remaining responsible for the development, deployment, and use of AI capabilities.
- Equitable: The Department will take deliberate steps to minimize unintended bias in AI capabilities.
- Traceable: The Department’s AI capabilities will be developed and deployed such that relevant personnel possesses an appropriate understanding of the technology, development processes, and operational methods applicable to AI capabilities, including transparent and auditable methodologies, data sources, and design procedure and documentation.
- Reliable: The Department’s AI capabilities will have explicit, well-defined uses, and the safety, security, and effectiveness of such capabilities will be subject to testing and assurance within those defined uses across their entire lifecycles.
- Governable: The Department will design and engineer AI capabilities to fulfill their intended functions while possessing the ability to detect and avoid unintended consequences, and the ability to disengage or deactivate deployed systems that demonstrate unintended behavior.
I’ve also discussed various collective analyses of AI ethics principles, including having covered a set devised by researchers that examined and condensed the essence of numerous national and international AI ethics tenets in a paper entitled “The Global Landscape Of AI Ethics Guidelines” (published in Nature), and that my coverage explores at the link here, which led to this keystone list:
- Transparency
- Justice & Fairness
- Non-Maleficence
- Responsibility
- Privacy
- Beneficence
- Freedom & Autonomy
- Trust
- Sustainability
- Dignity
- Solidarity
As you might directly guess, trying to pin down the specifics underlying these principles can be extremely hard to do. Even more so, the effort to turn those broad principles into something entirely tangible and detailed enough to be used when crafting AI systems is also a tough nut to crack. It is easy to overall do some handwaving about what AI Ethics precepts are and how they should be generally observed, while it is a much more complicated situation in the AI coding having to be the veritable rubber that meets the road.
The AI Ethics principles are to be utilized by AI developers, along with those that manage AI development efforts, and even those that ultimately field and perform upkeep on AI systems. All stakeholders throughout the entire AI life cycle of development and usage are considered within the scope of abiding by the being-established norms of Ethical AI. This is an important highlight since the usual assumption is that “only coders” or those that program the AI is subject to adhering to the AI Ethics notions. As earlier stated, it takes a village to devise and field AI, and for which the entire village has to be versed in and abide by AI Ethics precepts.
All told, we are today utilizing non-sentient AI and someday we might have sentient AI (but that is purely speculative). Both kinds of AI are obviously of concern for AI Ethics and we need to be aiming toward Ethical AI no matter how it is constituted.
Bringing back the topic of AI Confinement, there is a marked contrast between the nature of “confinement” that entails non-sentient AI versus sentient AI.
In the case of the confinement associated with sentient AI, we can wildly play a guessing game of nearly infinite varieties. Maybe sentient AI will cognitively be like humans and exhibit similar mental capacities. Or we could postulate that sentient AI will be superhuman and go beyond our forms of thinking. The ultimate in sentient AI would seem to be super-intelligence, something that might be so smart and cunning that we cannot today even conceive of the immense thinking prowess. Some suggest that our minds will be paltry in comparison. This super-duper AI will run rings around us in a manner comparable to how we today can outthink ants or caterpillars.
I like to depict the AI Confinement Problem as consisting of these two crucial and quite controversial contentions:
- Controversial Contention #1: It will purportedly be impossible to successfully confine sentient AI.
- Controversial Contention #2: It will purportedly be impossible for non-sentient AI to escape from our confinement.
In short, the first listed prevailing assertion is that sentient AI will be so conniving that no matter what manner of confinement we devise and how hard we try, the AI will escape. Humans will be unable to successfully confine sentient AI. The logic partially underlying that contention is that AI will invariably be able to outsmart humans, thus human-devised confinement will be outsmarted by a sentient AI. A caveat that begrudgingly comes with this is that human-level sentient AI might not be smart enough to break out, but that the superhuman or super-intelligent AI will.
Keep in mind too that when I refer to being able to escape, we earlier agreed that there are several variants associated with escaping. There is an escape that leads to failure during the attempt, and there are variations of escape that are more so successful yet lead to short-lived versus longstanding or perpetual freedom. We should apply those same parameters to the AI Confinement Problem.
An AI might momentarily escape and perhaps immediately get captured and reimprisoned. Or an AI might get out and later on be found and confined once again. There is also the possibility that the AI escapes, remains free, and we are never able to confine it again. I trust that you can envision all such possibilities.
Furthermore, we need to be careful and not treat AI as some kind of monolith. When people refer to AI, they at times use the phrasing in a categorically encompassing way. The odds are that AI is probably going to be more piecemeal and not one gigantic AI overlord (which is the usual portrayal). I am not saying that futuristic AI could not ever be bound together and coalesce into one thing, and instead just pointing out that this does not arise necessarily axiomatically.
We will for sake of discussion assume that there are lots of differing AI systems and that when we are contemplating the confinement of AI we are focused on a particular one or a specific set of AIs. Of course, as already stated, since we are for the moment speaking of the sentient AI, all bets are off since you can wildly make as many assumptions as possible about this unknown and yet to exist AI to your heart’s content.
A quick twist for you.
Suppose that it is the case that superhuman or super-intelligent AI can outwit us, humans. Suppose further that our human-devised confinement falls short because it was designed and built based on human wits (I am not agreeing to these suppositions, merely noting them). I ask you this, why would we not try to use superhuman or super-intelligent sentient AI to aid in deriving better confinement? The usual answer is that all sentient AI will be in cahoots and would not aid our quest. Or that the AI that is separate from the other AI is worried that we would end up turning the confinement on the AI that assisted our human-plus-AI devised escape-proof confinement. We certainly wouldn’t expect that a superhuman or super-intelligent AI would be dumb enough to make confinement that would potentially be used as a trap against itself.
Round and round that goes.
The second listed controversial contention is that non-sentient AI will not be able to escape from whatever confinement we set up for such AI. You see, the logic is that non-sentient AI will not be capable enough to essentially outwit humans. Humans will always be at least one step or more ahead of non-sentient AI. Any confinement that we design and build will be escape-proof. The AI will be captured and confront a “lifetime” behind bars.
I don’t buy into that notion, for several reasons.
We have witnessed cybercrooks that cleverly devised computer viruses that have kept going and going. Our efforts are primarily about blocking the computer virus rather than somehow capturing and imprisoning it. Humans devising non-sentient AI can likely find ways to code AI that is going to be extremely hard to keep confined.
In addition, humans can devise AI that is self-adjusting or self-modifying. Readers might be aware of my ongoing discussions covering AI-infused polymorphic computer viruses. These are shape-shifting computer viruses that are sneakily constructed to try and be undetectable, or that upon detection will rapidly reshape to avoid further detection.
There are akin ML/DL systems that intentionally aim to self-adjust or self-modify, allowing for the AI system to hopefully improve by itself as it is being used. This though can be problematic in that the AI might morph in a fashion that is no longer desirable and subsequently acts in notably disturbing ways, see the link here.
Another angle is that humans can use their computer-based tools, such as non-sentient AI, in order to craft AI confinement. In that sense, the premise that humans are only able to think to some limited degree is potentially a pretense. We might be able to augment our thinking processes via the likes of non-sentient AI and therefore find novel ways to design and build sufficient confinement for non-sentient AI.
All told, I am busting the two controversial contentions about AI Confinement and arguing that we cannot make any such irrefutable ironclad claims. We do not know for sure that we can always and without fail confine non-sentient AI. We do not know for sure that sentient AI of any caliber, including superhuman and super-intelligent, will always and without fail be able to escape our confinement.
The good news is that the whole kit and kaboodle are a hoot to consider. I say that somewhat cheekily. If we do find ourselves under threat by a non-sentient AI, we are going to soberly and strenuously want to ascertain whether we can confine it. Likewise, if we are under threat by a sentient AI of whatever caliber, we are going to desperately want to determine whether we can confine it.
The AI Confinement Problem is a meritorious conundrum and abundantly worthwhile to figure out.
I am guessing that you might be having a nagging ache about one key portion of the AI Confinement matter. All this talk about confinement seems silly or nonsensical since we are referring to AI rather than to a human being confined. The obvious thing to do would be to simply blow the AI to smithereens. Destroy any AI that we don’t like and believe ought to be confined. Forget about all these contortions associated with confinement and merely squash AI like a bug. This seems like the best solution whereby you don’t have to design and build confinements, instead expend your energies on wiping out AI that we humans decide is unworthy or dangerous.
Easy-peasy.
Turns out there is a series of logical retorts that you might want to ponder.
First, if the AI is sentient, we are possibly going to be willing to anoint such AI with a form of legal personhood, see my analysis at the link here. The concept is that we will provide AI with a semblance of human rights. Maybe not verbatim. Maybe a special set of rights. Who knows? In any case, you could conjure up the seemingly outlandish notion that we cannot just summarily wipe out sentient AI. There might be a stipulated legal process involved. This includes that we cannot necessarily exercise the “death penalty” upon a sentient AI (whoa, just wait until we as a society get embroiled in that kind of a societal debate). The gist is that we might need a suitable form of AI confinement in lieu of or while deciding whether to destroy a sentient AI.
Second, we might find useful value in an AI that we want to keep intact and not utterly destroy or delete. Suppose that we created a non-sentient AI that was leading us towards being able to cure cancer. Would we want to delete such AI? I hardly think so. Suppose that a fully sentient superhuman AI existed that promised to solve world hunger. Are we to wipe out this promising sentient AI, doing so without first resolving global hunger? We ought to think carefully about that.
The point is that we could have a variety of bona fide reasons to keep AI intact. Rather than deleting it or scrambling it, we might wish to ensure that the AI remains whole. The AI is going to be allowed to perform some of its actions in a limited manner. We want to leverage whatever AI can do for us.
How can we then have our cake and eat it too?
Answer: AI confinement.
Throughout this discussion, I have alluded to a kind of comparison between human confinement and AI confinement. To set the record straight, I am generally opposed to anthropomorphizing AI. I will say more about this momentarily. The reason I bring the qualm up now is that I do not want to suggest or imply that today’s non-sentient AI is analogous to humans and humankind. There is already way too much of that type of false and misleading comparison going on. Excuse my comparative usage which I’ve done in a hopefully careful and mindful fashion.
Trying to figure out how to confine AI is an interesting and abundantly useful proposition.
Even if we don’t have AI today that presents an immediate need for confinement, the matter provides plenty of challenges that can aid in advancing our understanding of cybersecurity. Heaven knows we need to keep on trucking when it comes to boosting cyber protections all told. The pursuit of cybersecurity for confining AI is likely to uncover many useful side benefits that apply to overcoming those human evildoers that use computing for wrongdoing, along with tackling those pesky and ever-expanding computer viruses.
Those that seek to denigrate the erstwhile pursuit of the AI Confinement Problem are missing the bigger picture. For them, they only see this as a solely farfetched and far-off issue for the someday remote possibility of containing superhuman AI or super-intelligent AI. You should instead construe this as a handy means for advancing cybersecurity, entirely applicable in today’s world, and for being prepared to deal with perhaps soon non-sentient AI that we might want to confine (along with getting ready for the presumably catastrophic risk of a heinous sentient AI, despite the seemingly low chances of our encountering such AI).
Better to be safe than sorry.
A recent research analysis about the AI Confinement Problem posits this indication of the current status: “AI confinement is a difficult problem with many proposed measures to improve security but no guarantees about safety. However, our current AI either make no attempt to escape or lack the ability to break out of simulations. There are some examples of real-world AI breaking a layer of confinement, but none that present any risk to humans or the capability establishing a persistent presence on remote computers” (Philosophies journal, “Understanding and Avoiding AI Failures: A Practical Guide” by Robert Williams and Roman Yampolskiy).
Those same researchers also describe a series of escape-potential levels:
- Escape Potential Level 0: None for conventional software
- Escape Potential Level 1: The agent may find unexpected behaviors which technically achieve the goal but are not desirable, such as genetic algorithms
- Escape Potential Level 2: The agent intentionally seeks to hack its environment, such as advanced reinforcement learners
- Escape Potential Level 3 (danger): The agent has the ability and drive to break through most cybersecurity and containment procedures, such as Artificial General Intelligence (AGI)
An explanation is proffered about these escape-potential levels: “At level 0, no confinement is needed beyond standard cybersecurity measures appropriate for the deployment environment. At levels 1 and 2, there is little risk of hostile escape, but the agent should be monitored closely for undesirable behavior and software exploits. The distributional shift from the simulator to the real world provides a barrier against narrow AI affecting the real world. Because of this, agents trained in more realistic simulators should be regarded with a greater degree of caution. At level 3, indefinite confinement is impossible, but, with robust security, at the level of software, hardware, and social engineering, humans can benefit from a contained AGI in a limited capacity” (per Robert Williams and Roman Yampolskiy).
The proposed set of escape-potential levels is thought-provoking. As you’ll see in a moment, delineating automation via the use of graduated levels is a useful means of characterizing the scope and capacity of said automation. I’ll describe for you the same concept as it relates to autonomous vehicles and AI-based self-driving cars. One notable difference is worth observing. For self-driving cars, there is an agreed-upon standard set of levels, while the above indicated escape-potential levels present an initial and preliminary strawman (you can undoubtedly anticipate that further refinements will be undertaken as the AI Confinement field further matures).
Let’s contemplate the rationale or basis for wanting to confine AI.
The most apparent reason to confine AI would be to stop it from deplorable acts. We’ve already unearthed that instead of wiping out the AI, we might want to keep the AI caged so that it still can be run and meanwhile be prevented from causing harm. This might or might not be simultaneously feasible. There is a chance that the AI is unable to suitably run while imprisoned and therefore we lose the other desirable aspect of gleaning whatever positive value we sought to accrue. Imagine the consternation of having confined the AI though doing so at the cost that the remaining valued capability is no longer viably available. Drat!
There is a veritable range of reasons to confine AI, including but not limited to:
- Incapacitation of the AI
- Detention of the AI
- Protection for humans
- Protection from humans
- Rehabilitation of the AI
- Deterrence for other AI
- Retribution against the AI
- Etc.
You can take your time to mindfully mull over those reasons. Some of the reasons are readily justified. Some of them might seem to be curious and possibly unwarranted.
When discussing human escape artists, there was something that I didn’t earlier mention. An escape artist might undertake their escape entirely of their own accord. Houdini often made his escape without any assistance from anyone else. That being said, Houdini sometimes used an accomplice or assistant to aid in making good his escape. You can see the same aspects in the prison escapes that were cited.
Here’s why that is important to the dilemma of AI confinement.
Suppose we are able to confine AI. It might be sentient AI or might be non-sentient AI (we’ll consider them jointly, first, and then separate the particulars). The AI tries to escape by some electronic or computer-based trickery. Assume that the AI is acting alone.
The AI might not have to act solo in its escape efforts. A human comes along and lets the AI escape. Maybe the human opens an electronic network portal and the AI transmits itself out to other non-confined computers elsewhere. Or perhaps the human makes a copy of the AI onto an external disk drive or memory stick, carrying the AI out of the confinement, and then later on plugs the disk or memory stick into a non-confined computer.
Humans could easily be the weak link in an effort to confine AI.
Why in the heck would a human allow an AI to escape from confinement, you might be exhorting. Surely, all humans would want to keep AI confined if the AI was presumably purposefully placed into confinement. If you are thinking those kinds of thoughts, I suggest you have a somewhat happy face optimistic view of humankind.
An evildoer might believe that they can harness the AI to do evil acts. Some wrongdoers might figure that the AI will wreak havoc, and do not care whether this is profitable for them or not. One supposes that a human might be sympathetic to the AI and believe that the AI was wrongfully incarcerated. On and on the possibilities endlessly go.
There is the accidental release that can occur too. A human with no intention of letting out the AI makes a mistake and voila, the AI is let loose. Consider that a human intending to make the confinement more stringent makes a boo-boo and inadvertently undercuts the rigors of the confinement, allowing the AI to escape.
If the AI can interact with humans while the AI is in confinement, the problem of attaining AI confinement is likely astronomically increased. Here’s why.
An AI system might be able to use social engineering (SE) ploys. I’m sure you’ve had this done to you by humans. A person calls you on your cellphone and claims they are from your bank. They tell you that someone has cracked into your bank account. To fix the break-in, the person asks for your password. Amazingly, you give your password to this person. They then use it to log in to your bank account and steal your money. This is an example of using social engineering techniques to compromise computer systems.
Envision a sentient AI that is acting cleverly and tries this: “In addition to trivial SE attacks such as lying, befriending, blackmailing, hypnotizing and forming a romantic relationship the AI could utilize a number of more advanced strategies. One such strategy would be super-bribery. An AI can convince the guard that releasing the AI is in the guard’s best interests. In addition to money and power the guard will be given perfect health, immortality or anything else for that matter. This strategy would work particularly well if the guard has a seriously ill close relative who the AI will promise to cure. Depending on the capabilities of the AI or as a result of lying, AI may also offer to bring back long-ago deceased relatives and friends of the guard. It would be hard to find a person who could remain immune to such bribes” (Roman Yampolskiy, “Leakproofing the Singularity Artificial Intelligence Confinement Problem” in the Journal of Consciousness Studies).
Whatever confinement is devised, there is always the potential that a human might disturb or undercut the confinement and allow the AI to escape. This seems especially possible if AI can communicate with humans. Not just any humans, even only when the AI can communicate with trained and specialized human guards that are supposedly aware of and ready for the tricks that the AI might play on them.
A particular focal point of the AI Confinement Problem deals with trying to establish a confinement protocol that can allow for communicating with the AI but restricting the AI such that it is not able to confound or convince humans to let the AI out. Some claim that any permitted communication could result in the AI ultimately getting a human to open the confinement. Others postulate that a multiple-choice protocol or a binary encoding might be a means to prevent the AI from deviously fooling a human guard.
This brings up yet another set of levels that we might wish to consider. The levels of communication security might be devised this way (per the aforementioned paper):
- Communications Security Level 0: Unlimited communication
- Communications Security Level 1: Censored input, uncensored output
- Communications Security Level 2: Outputs only with no inputs
- Communications Security Level 3: Unlimited input and censored output
- Communications Security Level 4: Secured communication
- Communications Security Level 5: Censored output and no inputs
- Communications Security Level 6: Inputs only with no outputs
- Communications Security Level 7: Censored input and no outputs
- Communications Security Level 8: No communication, fully confined AI
The AI could also have other tricks up its sleeve.
Assume we have placed AI into seemingly ironclad confinement consisting of a specialized computer system that is all locked up. And then this happens: “Additionally the AI may engage in the sabotage of the confinement hardware in the hopes of being able to escape as a result of the malfunctions in the security protocol. For example, the AI may try to melt the hardware via temperature increase from placing a high computational load on the processor or try to break a hard drive via multiple reads of the same location” (per the cited paper by Roman Yampolskiy).
You can add to the vulnerabilities of confinement that the confinement container or the facility it is housed in gets disrupted due to some natural act such as a tornado or a hurricane. Maybe the AI can exploit that occurrence. It could be that a nation that is an enemy of the country housing the AI in confinement decides that performing a breach of the confinement will be good for causing panic and serve as a sideways attack against the country that is their avowed foe. Perhaps human activists avowing AI rights raid the facility and let the AI loose.
Let your mind wander as to the many avenues of the AI confinement being compromised.
In one of the earliest papers about the AI Confinement Problem, Butler Lampson of the Xerox PARC (Palo Alto Research Center) published an article in 1973 entitled “A Note on the Confinement Problem” that proposed a handful of leak-proofing rules:
- Total isolation: A confined program shall make no calls on any other program.
- Transitivity: If a confined program calls another program which is not trusted, the called program must also be confined.
- Masking: A program to be confined must allow its caller to determine all its inputs into legitimate and covert channels. We say that the channels are masked by the caller.
- Enforcement: The supervisor must ensure that a confined program’s input to covert channels conforms to the caller’s specifications.
At this juncture of this weighty discussion, I’d bet that you are desirous of some illustrative examples that might showcase the AI Confinement Problem in today’s world. There is a special and assuredly popular set of examples that are close to my heart. You see, in my capacity as an expert on AI including the ethical and legal ramifications, I am frequently asked to identify realistic examples that showcase AI Ethics dilemmas so that the somewhat theoretical nature of the topic can be more readily grasped. One of the most evocative areas that vividly presents this ethical AI quandary is the advent of AI-based true self-driving cars. This will serve as a handy use case or exemplar for ample discussion on the topic.
Here’s then a noteworthy question that is worth contemplating: Does the advent of AI-based true self-driving cars illuminate anything about the AI Confinement Problem, and if so, what does this showcase?
Allow me a moment to unpack the question.
First, note that there isn’t a human driver involved in a true self-driving car. Keep in mind that true self-driving cars are driven via an AI driving system. There isn’t a need for a human driver at the wheel, nor is there a provision for a human to drive the vehicle. For my extensive and ongoing coverage of Autonomous Vehicles (AVs) and especially self-driving cars, see the link here.
I’d like to further clarify what is meant when I refer to true self-driving cars.
Understanding The Levels Of Self-Driving Cars
As a clarification, true self-driving cars are ones where the AI drives the car entirely on its own and there isn’t any human assistance during the driving task.
These driverless vehicles are considered Level 4 and Level 5 (see my explanation at this link here), while a car that requires a human driver to co-share the driving effort is usually considered at Level 2 or Level 3. The cars that co-share the driving task are described as being semi-autonomous, and typically contain a variety of automated add-ons that are referred to as ADAS (Advanced Driver-Assistance Systems).
There is not yet a true self-driving car at Level 5, and we don’t yet even know if this will be possible to achieve, nor how long it will take to get there.
Meanwhile, the Level 4 efforts are gradually trying to get some traction by undergoing very narrow and selective public roadway trials, though there is controversy over whether this testing should be allowed per se (we are all life-or-death guinea pigs in an experiment taking place on our highways and byways, some contend, see my coverage at this link here).
Since semi-autonomous cars require a human driver, the adoption of those types of cars won’t be markedly different than driving conventional vehicles, so there’s not much new per se to cover about them on this topic (though, as you’ll see in a moment, the points next made are generally applicable).
For semi-autonomous cars, it is important that the public needs to be forewarned about a disturbing aspect that’s been arising lately, namely that despite those human drivers that keep posting videos of themselves falling asleep at the wheel of a Level 2 or Level 3 car, we all need to avoid being misled into believing that the driver can take away their attention from the driving task while driving a semi-autonomous car.
You are the responsible party for the driving actions of the vehicle, regardless of how much automation might be tossed into a Level 2 or Level 3.
Self-Driving Cars And The AI Confinement Problem
For Level 4 and Level 5 true self-driving vehicles, there won’t be a human driver involved in the driving task.
All occupants will be passengers.
The AI is doing the driving.
One aspect to immediately discuss entails the fact that the AI involved in today’s AI driving systems is not sentient. In other words, the AI is altogether a collective of computer-based programming and algorithms, and most assuredly not able to reason in the same manner that humans can.
Why is this added emphasis about the AI not being sentient?
Because I want to underscore that when discussing the role of the AI driving system, I am not ascribing human qualities to the AI. Please be aware that there is an ongoing and dangerous tendency these days to anthropomorphize AI. In essence, people are assigning human-like sentience to today’s AI, despite the undeniable and inarguable fact that no such AI exists as yet.
With that clarification, you can envision that the AI driving system won’t natively somehow “know” about the facets of driving. Driving and all that it entails will need to be programmed as part of the hardware and software of the self-driving car.
Let’s dive into the myriad of aspects that come to play on this topic.
First, it is important to realize that not all AI self-driving cars are the same. Each automaker and self-driving tech firm is taking its approach to devising self-driving cars. As such, it is difficult to make sweeping statements about what AI driving systems will do or not do.
Furthermore, whenever stating that an AI driving system doesn’t do some particular thing, this can, later on, be overtaken by developers that in fact program the computer to do that very thing. Step by step, AI driving systems are being gradually improved and extended. An existing limitation today might no longer exist in a future iteration or version of the system.
I trust that provides a sufficient litany of caveats to underlie what I am about to relate.
We are primed now to do a deep dive into self-driving cars and the AI Confinement Problem.
You might be aware that there have been reported instances of Level 2 semi-autonomous cars that had a human driver at the wheel and the human fell asleep while actively underway on a freeway or highway. The scary aspect of Level 2 and Level 3 is that the human driver is still in charge of the driving, and yet they can be lulled into falsely believing that the AI or automation is fully capable to drive the car on its own. The push to ensure that an onboard monitoring system keeps track of the human driver and their driving status is a means to try and mitigate the being-lulled proclivity.
The news stories have showcased instances whereby a police officer in their police car has maneuvered in front of the Level 2 vehicle, then gradually opted to slow down their police car, which in turn has indirectly led to the Level 2 car slowing down correspondingly. This nifty trick is predicated on the idea that the Level 2 car has some form of sensor devices such as video cameras, radar, LIDAR, or the like that are used to detect vehicles that are ahead of the Level 2 car. Upon detecting the vehicle in front of the Level 2 car, the automation will automatically adjust its speed as per the speed of the vehicle ahead.
You could say that AI is being persuaded to slow down.
Suppose that the AI or automation in the case of a seeming runaway semi-autonomous (or even fully autonomous self-driving) car is programmed to switch lanes and avoid getting bogged down by a vehicle in front of the car.
What else can we do to contend with this?
You could try surrounding the errant vehicle with an entire posse of police cars. Position one in front of the targeted vehicle, position another on the left, another on the right, and one directly behind the runaway car. The vehicle is now boxed in. Unless it can sprout wings, it cannot presumably escape the confinement.
Notice that this is a form of physical confinement. Almost like putting an animal in a cage or forcing a robot into a prison cell. For AI-based systems that are principally robots, the confinement might indeed often be a physical form of confinement whereby the AI needs to be strictly controlled. Keep in mind that a self-driving car is essentially a type of robot. You probably do not think of self-driving cars in that manner, but overall they are in fact robots that have wheels and drive on our roadways.
A significant problem with this form of confinement is that we don’t necessarily know how the AI driving system will react. You can potentially have all the police cars in unison gradually slow down and the runaway car will hopefully correspondingly do so too (it won’t be able to switch lanes or get out of the blocking confinement). That is the happy face scenario.
We don’t know for sure that this is what will happen.
It could be that the AI is not well devised, or has errors, and it ends up ramming one or more of the police cars. Assuming that the officers are not killed, this might save lives all told, though the officers could potentially get injured and all of the vehicles might get severely damaged.
A more failsafe form of confinement for a self-driving car would be to place the vehicle in a secured garage that will entrap the driverless vehicle. The preceding example about an underway vehicle was more complicated since the vehicle was essentially free to roam. Placing the self-driving car into a locked garage might confine the essentially AI robotic system, though if someone opens the garage or somehow the AI is able to electronically get the garage doors to open, an escape or kind of jailbreak could potentially ensue.
That exemplifies the physical nature of AI confinement. Next, consider the software aspects of AI confinement.
Assume for sake of discussion that the AI driving system is calling the shots as to how the self-driving car is going to operate. This presents another avenue for confinement, namely consisting of trying to confine the AI driving system per se that is on-board the vehicle.
One means might be to have pre-built or programmed virtual confinement around the AI driving system that is always existent in the onboard computer that is running the AI of the vehicle. We might be able to send an electronic signal to the confinement or imprisonment code that indicates to go ahead and trap the AI driving system, preventing the AI from operating the self-driving car. Some prearranged special signal could activate the confinement and block the AI driving system. Thus, this would prevent the AI from being able to utilize the driving controls, effectively overruling the self-driving car from proceeding via any of the AI commands.
We would need to be mindful of how this works. A self-driving car that is underway at highway speeds could become a dangerous unguided missile of sorts if the confinement was suddenly enacted and the AI abruptly no longer was able to drive the autonomous vehicle (we are assuming too that there aren’t any human accessible driving controls that a human could use to undertake the driving).
How else might such an AI confinement be useful in the use case of self-driving cars?
For those of you that are worried about the possibility of AI self-driving cars all going amok at the same time, either by the AI doing so itself or due to an evildoer that corrupts the AI, I have discussed this intriguing and worrisome notion at the link here. A built-in AI confinement configuration is one of several ways to try and overcome such a malicious takeover of autonomous vehicles.
I am sure that you realize that just because we possibly have some concocted or devised virtual confinement that surrounds the AI of a self-driving car this is not a guarantee of successfully preventing an escape by the AI. A non-sentient AI might have been programmed to do some clever jailbreak. A sentient AI might be clever enough to figure out a means around the AI confinement or convince humans to let it free, as noted earlier.
Conclusion
In the famous adventure novel, The Count of Monte Cristo, we are treated to a captivating story revolving around a man that is wrongfully imprisoned. He manages to escape. He acquires a great fortune. He then seeks revenge upon those that confined him. When you have time to do so, you really must read this quite wonderfully written story or at least watch one of the numerous movie versions.
A memorable line is this: “How did I escape? With difficulty. How did I plan this moment? With pleasure.”
If we are aiming to confine AI, we will need to do a lot of careful planning and try to anticipate how to establish ironclad imprisonment (if that is even possible). The question arises as to whether the AI will also be doing careful planning about how to break free of the confinement. For non-sentient AI, this might be a computational subroutine built into the AI by the human developers of the AI. For sentient AI, if we ever see this, the AI might astutely do its own jailbreak planning.
Are we going to be able to solve the AI Confinement Problem and come up with a surefire means of confining any and all AI systems in an utterly escape-proof or absolutely leak-proof manner?
As stated eloquently in the prison movie The Shawshank Redemption, some birds aren’t meant to be caged.
AI might very well be that kind of bird.
Source: https://www.forbes.com/sites/lanceeliot/2022/05/05/ai-ethics-is-especially-vexed-by-that-ai-confinement-problem-including-the-knotty-particulars-for-confining-autonomous-self-driving-cars/