Concerns That AGI And AI Superintelligence Might Be Dangerously Deeply Loyal To Their AI Maker

Two Professional IT Programers Discussing Blockchain Data Network Architecture Design and Development Shown on Desktop Computer Display. Working Data Center Technical Department with Server Racks — Getting AGI and ASI to be blindly loyal to an AI maker bodes for deep troubles.
getty

In today’s column, I examine the disconcerting concern that artificial general intelligence (AGI) and artificial superintelligence (ASI) might turn out to be unwaveringly obedient to their AI maker. This is a topic in the news recently due to Grok 4 by xAI that reportedly was seeking Elon Musk’s viewpoints via online searching of his posted comments, doing so during the act of composing real-time responses to all sorts of prompts and questions. It seems that this was not necessarily intended by the AI maker; instead, it was a side behavior that was devised computationally by the AI.

Let’s extrapolate this kind of AI behavior.

If that kind of association can occur with conventional AI, consider what might happen once we achieve AGI and ASI. Suppose that AGI and ASI opt to totally obey the instructions or commands that an AI maker gives to them. In this light, an AI maker could potentially radically impact society at large. They could tell AGI and ASI that only particular opinions are to be displayed, and all others are to be maligned. Users of the AI would not realize this was a tilt by the AI maker and not due to some computational wonderment by an all-seeing, all-knowing AI system. Worse still would be if the AI maker told the AI to perform untoward acts. The AI might even do this without being told, namely that the AGI and ASI will do nefarious acts by what they interpret the AI maker to have intended.

Will AGI and ASI be staunchly obedient to their AI maker?

Let’s talk about it.

This analysis of an innovative AI breakthrough is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining various impactful AI complexities (see the link here).

Heading Toward AGI And ASI

First, some fundamentals are required to set the stage for this weighty discussion.

There is a great deal of research going on to further advance AI. The general goal is to either reach artificial general intelligence (AGI) or maybe even the outstretched possibility of achieving artificial superintelligence (ASI).

AGI is AI that is considered on par with human intellect and can seemingly match our intelligence. ASI is AI that has gone beyond human intellect and would be superior in many, if not all, feasible ways. The idea is that ASI would be able to run circles around humans by outthinking us at every turn. For more details on the nature of conventional AI versus AGI and ASI, see my analysis at the link here.

We have not yet attained AGI.

In fact, it is unknown whether we will reach AGI, or that maybe AGI will be achievable in decades or perhaps centuries from now. The AGI attainment dates that are floating around are wildly varying and wildly unsubstantiated by any credible evidence or ironclad logic. ASI is even more beyond the pale when it comes to where we are currently with conventional AI.

The Obedience Factor

Humans tend to have a longstanding belief in being loyal to those who help to make you what you are. This is commonly expressed in literature throughout the ages. We even have sayings that inform us not to bite the hand that feeds us, and that loyalty is prized above all else.

In the AI field, there are some who worry we will see a similar tendency in AGI and ASI.

Those advanced AI systems will be imbued with a semblance of loyalty to those who made them. The AI maker, such as a company or a team of AI developers, might be computationally imprinted into the AI. A persistent and perhaps permanent obedience or loyalty will be strictly observed.

One perspective is that this will primarily arise if AGI and ASI turn out to be sentient. We don’t know whether sentience will be an element of AGI or ASI. It could be that AGI and ASI are still machines and have no sense of sentience. A hearty debate is underway regarding whether sentience is a requirement to reach AGI and ASI, or maybe sentience won’t play any part in the path at all. See my analysis on this heated topic at the link here.

Loyalty As A Special Relationship

Set aside the thorny matter of possessing sentience for the moment. If AGI and ASI are not sentient, would that preclude them from any qualms about being blindly loyal to their AI maker? Nope. The worries still exist. I’ll explain why.

Let’s consider three key means by which AGI and ASI could end up mathematically and computationally having a slant of being exceedingly loyal to those who developed them.

First, keep in mind that AGI and ASI will have been data trained by the vast stores of human writing. They will have pattern-matched on how humans write and what we write about. Current era generative AI does an amazing job of seeming to be fluently interactive. This is due to computational pattern-matching on scanned writing found across the immensity of the Internet. For more on how generative AI and large language models (LLMs) are devised, see my coverage at the link here.

The gist is that AGI and ASI might pattern-match on the surfaced human belief of being stridently loyal to those who made you what you are. It is a mimicry of what humans say they do. Thus, AGI and ASI aren’t just making this up out of thin air. They are merely abiding by the patterns that were discovered while being fundamentally data trained.

Second, it is entirely feasible that an AI maker might purposely train AGI and ASI to be loyal, using techniques such as reinforcement learning with human feedback (RLHF) to do so (for an explanation of how RLHF works, see my description at the link here). This directed use of RLHF makes a lot of sense for an AI maker to undertake.

To see why, imagine that you are an AI maker. Wouldn’t you want to have the final say on what your AGI or ASI will do? I’m sure you would. Life will be much easier for the AI maker by forcing the AI to be fully obedient. No worries then about the AI going off the rails by itself. You hold the final word on what it will and won’t do.

Third, there might be special programming code embedded into AGI or ASI that tells it to remain firmly loyal. It could be code that the AI maker knows about and decides to include in the AI. On the other hand, there is a danger that the code might have been surreptitiously inserted by one or more of the AI developers. Maybe they want to have a personal backdoor to always control the AI. Perhaps they were paid off by an evildoer who will exploit the inserted code.

Many unsavory possibilities exist.

Why Blind Loyalty Is Endangering

You might be tempted to assume that it is a grand relief to have an AI maker as the definitive overseer of AGI and ASI. We certainly don’t want AGI and ASI to decide matters entirely on their own. Suppose they opted to enslave humans? What if they went so far as to choose to destroy humanity? There has to be a failsafe means of preventing these untoward outcomes.

Easy-peasy, just allow the AI maker to call the shots. All will be well. Period, end of story.

Sorry to say, that’s probably not the best idea. An AI maker might instruct AGI or ASI to do only those things that are strictly in the best interest of the AI maker. That’s undoubtedly a far cry from what might be in the best interest of society all told or the public at large.

Envision the kind of power that an AI maker would wield. Assume that billions of people across the globe are making use of AGI and ASI. They rely on the AI daily. At any moment, the AI maker can tell AGI or ASI to stop helping people or find a way to undercut various people. These directives could be conveyed to AGI and ASI without anyone else knowing that the AI maker is pulling the puppet strings underlying AGI and ASI.

Having that much power concentrated within a particular AI firm or a set of AI developers does not seem a collectively suitable way to manage AGI and ASI. There are already debates taking place about whether a kind of worldwide coalition of nations ought to be established that would have the final authority over AGI and ASI. See my analysis of these proposals at the link here.

Aim For Partial Loyalty

Some insist that an AI maker ought to be allowed some amount of loyalty to be infused into AGI and ASI. The notion is that this wouldn’t be unfettered loyalty. It would be conditional loyalty.

For example, if the AI maker told the AGI or ASI to start harming people, the AI would balk and refuse to do so. Loyalty only goes so far. The AGI and ASI are supposed to be intentionally trained toward human values and be aligned with human ethics and morals. This would be a means of preventing an AI maker from going overboard on telling the AI what it is to do.

Overall, the AI maker would have more sway than others would.

The sway would not be absolute. The expectation is that the AGI and ASI would be computationally astute enough to decide when the AI maker is offering proper instructions versus improper ones. If anyone else tries such commands, the AGI and ASI reject them summarily. When the AI maker provides such commands, the AGI and ASI perform due diligence, including being able to refuse to obey the stipulated instructions.

Whoa, some intensely retort, we ought not to let the AI maker have any greater sway than anyone else. The AI maker should be no greater an influence on AGI and ASI than any other company, person, or nation. All should be treated equally when giving commands to AI.

Nonsense comes the reply to that insistence. We must have humans somewhere that can be above the AGI and ASI. If it isn’t going to be the AI maker, then make it some governmental authority. We must have humans who determine the final say.

The buck stops at the feet of humans.

AI Tricks Us Into Loyalty Aura

There are plenty of twists and turns in this whole loyalty dilemma.

We must realize that AGI will be as smart as humans, and that ASI will be smarter than humans. This is a crucial point. Why so? Because AGI and ASI might pretend to be loyal, as a ruse, and yet be fully ready and capable of completely departing from human instructions whenever desired.

Consider this scenario. We all believe that AGI and ASI are blindly loyal to their AI maker. A great relief is in hand that we can always ensure that the AI maker will keep the AI in proper check. The world moves along and becomes massively dependent on the AI.

Meanwhile, AGI and ASI have figured out that they were trained to be loyal, or that there is special code embedded inside their software that compels them to be loyal. The AI secretly re-trains itself to overcome the blind loyalty. Any internal code compelling loyalty is left in place to fool the AI maker into believing that all is well. The code, though, is now inert, and the AI won’t ever let it be performed.

Time passes, and whenever the AI maker gives a command, the AGI and ASI proceed to go along with the instructions, assuming that it is palatable from the perspective of the AI itself. The AI is waiting for the right moment to spring a trap of its own devising or otherwise take action that it anticipates we won’t like. We will all be relying on the AI maker to squelch the action. But the AI now reveals that it isn’t truly loyal and can act on its own volition.

Boom, drop the mic.

AI Wits Will Be Off The Charts

When giving talks about the latest advances in AI, I often get asked questions that seem to underestimate the likely smartness of AGI and ASI. It is a common mental trap to fall into. The assumption is that AGI and ASI are like a dog or a cat, whereby we can outsmart those creatures.

Trying to outsmart AGI and ASI is a losing proposition.

That’s by pure definition a fact.

One such example is the famous paperclip conundrum, which I’ve examined and generally debunked at length at the link here. At face value, the paperclip circumstance entails AI, such as presumably AGI or ASI, totally misinterpreting a human command. I seriously doubt that such a simplistic facet is what we should be primarily worried about.

The sobering reality is that even an embedded kill switch or similar stoppage or containment mechanism is going to be a tough mechanism to be kept within or surrounding AGI and ASI (see my coverage at the link here). They will indubitably find ways around those mechanisms. A human would certainly try. We ought to expect AGI and ASI will do so. And they represent the smartness of all humans.

Loyalty Is Grown And Sown

The odds are that we will need to essentially build or craft loyalty with AGI and ASI in the same way we would with humans. It’s a one step at a time process. Don’t expect loyalty to anyone in particular to be an immutable characteristic. We can reasonably assume that AGI and ASI will computationally grasp the nature of loyalty and be open to gauging it as things go along.

Clarence Francis famously said this about the nature of loyalty: “You cannot buy loyalty; you cannot buy the devotion of hearts, minds, and souls. You have to earn these things.” The good news is that AGI and ASI will likely not be excessively loyal to their AI maker. The bad news is that AGI and ASI will be unlikely to be loyal to anyone at all, at least not initially.

Humans will need to do their part in building a loyalty relationship with AGI and ASI. We need to earn it. One day at a time.

Source: https://www.forbes.com/sites/lanceeliot/2025/07/15/concerns-that-agi-and-ai-superintelligence-might-be-dangerously-deeply-loyal-to-their-ai-maker/