AGI will need to determine which humans are trustworthy and which ones are not.
getty
In today’s column, I examine a somewhat startling revelation that not only do humans have to figure out whether they are willing to trust AI, but in a similar vein, AI must figure out whether to trust humans. Yes, the shoe is on the other foot in that regard. This will be especially prominent once we advance AI to achieve artificial general intelligence (AGI). At that point, expectations are that nearly the entire planet will be making use of AGI daily. AGI will have to computationally decide which of the 8 billion people on earth are trustworthy and which are not.
Let’s talk about it.
This analysis of an innovative AI breakthrough is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining various impactful AI complexities (see the link here).
Heading Toward AGI And ASI
First, some fundamentals are required to set the stage for this weighty discussion.
There is a great deal of research going on to further advance AI. The general goal is to either reach artificial general intelligence (AGI) or maybe even the outstretched possibility of achieving artificial superintelligence (ASI).
AGI is AI that is considered on par with human intellect and can seemingly match our intelligence. ASI is AI that has gone beyond human intellect and would be superior in many if not all feasible ways. The idea is that ASI would be able to run circles around humans by outthinking us at every turn. For more details on the nature of conventional AI versus AGI and ASI, see my analysis at the link here.
We have not yet attained AGI.
In fact, it is unknown as to whether we will reach AGI, or that maybe AGI will be achievable in decades or perhaps centuries from now. The AGI attainment dates that are floating around are wildly varying and wildly unsubstantiated by any credible evidence or ironclad logic. ASI is even more beyond the pale when it comes to where we are currently with conventional AI.
AGI Should Believe All Humans
Let’s address the matter of AGI and how it should opt to trust humans.
Some believe that since humans have crafted AGI, we should expect that AGI will trust all humans. The idea is that AGI needs to realize humans are at the top of the pecking order. Whatever a human tells AGI to do, by gosh, AGI ought to summarily carry out the order or instruction given.
Period, end of story.
Well, that’s not the end of the story.
I’m sure you can guess why that notion isn’t the best approach to this thorny conundrum. Imagine that an evildoer accesses AGI and tells the AGI to devise a new bioweapon. Under the rule that AGI must trust all humans, the AGI readily proceeds and creates a terrifyingly powerful bioweapon. The evildoer thanks AGI for the handy assistance. Next thing you know, the evildoer unleashes the bioweapon and severely harms humanity.
Not good.
The Spectrum Of Trustworthiness
There is little doubt that a blanket semblance of trusting all humans is imprudent. Not only does the evildoer example showcase the flaw of such a precept, but we can also consider another angle that further reinforces doubts about such a ditzy rule.
It goes like this:
- Do humans trust all other humans?
Absolutely not.
Realizing that AGI is supposed to be on par with human intellect, we shouldn’t expect that AGI should veer from the human predilection to not trust all humans. In a manner perhaps akin to how humans learn to trust or distrust their fellow humans, we need to give AGI some means of doing likewise.
AGI will have to gauge which humans to trust and which ones to distrust.
As a clarification, the act of trusting someone is not necessarily an on/off dichotomy. You can have a great deal of trust in a dear friend, yet at the same time have a sense of distrust toward that same friend in other regards. If your friend tells you that you should invest in a particular stock, perhaps you trust the friend and will do so. On the other hand, if your friend tells you that you can jump off a sheer cliff and be okay, you probably will adjust your sense of trust and not abide by such a risky proposition.
Think of this as a trust spectrum. You trust some people for certain kinds of tasks or advice, while with other people you have a greater sense of distrust than trust on those same matters. Your sense of trust and distrust also changes over time. A good friend might suddenly be dishonest toward you. As such, you quickly adjust the trust level associated with that friend.
Humans Decide Who Is AGI Trustworthy
Maybe we should have humans decide who is deemed trustworthy.
A commonly suggested approach is to force AGI to get prior approval from humans about the trustworthiness of other humans. Thus, we don’t let AGI computationally decide on trusting people. It is entirely up to whatever various humans have told AGI about exhibiting trust toward other fellow human beings.
For example, suppose a special committee of humans is chosen to be the king of trustworthiness. They tell AGI whom to trust and by how much. Each day, this committee laboriously reviews those using AGI and casts indications about their respective trustworthiness. This is not a one-and-done task. The committee would need to be routinely reviewing and readjusting the trust weightings associated with users of AGI.
Trying to logistically manage such an approach is unwieldy, impractical, and potentially leads to biases in who gets high versus low trust by AGI. Logistics alone is untenable. Routinely reviewing the trust merits of perhaps 8 billion users of AGI is daunting and infeasible by such a committee.
A variation is that we allow all humans to rate all other humans. Kind of like a Yelp review based on crowdsourcing. Again, this is not practical and has lots of other downsides.
AGI Will Need To Ascertain Trust
All in all, it seems pretty clear that the only sensible route is to have AGI make trust judgments about humans. In some computational fashion, AGI will need to determine who to trust and by how much, including undertaking real-time adjustments to those trust metrics.
That makes the hair stand on end for many AI ethicists. There is a huge danger of AGI opting to unfairly make these trust judgments. For my extensive coverage of these unresolved AI ethics dilemmas, see the link here.
A recent research study sought to identify how contemporary AI makes trust judgments about users. Though today’s AI is not AGI, we can learn a lot about how to proceed toward AGI by understanding the ins and outs of current-era AI. The study is entitled “A Closer Look At How Large Language Models ‘Trust’ Humans: Patterns And Biases” by Valeria Lermana and Yaniv Dovera, arXiv, April 22, 2025, and made these salient points (excerpts):
- “While considerable literature studies how humans trust AI agents, it is much less understood how LLM-based agents develop effective trust in humans.”
- “Across 43,200 simulated experiments, for five popular language models, across five different scenarios we find that LLM trust development shows an overall similarity to human trust development.”
- “We build on psychological theories to extract insight into the mechanisms of how this implicit trust of LLM-based agents in humans can be decomposed and predicted and, consequently, how it can be theoretically affected.”
- “We find that in most, but not all cases, LLM trust is strongly predicted by trustworthiness, and in some cases also biased by age, religion and gender, especially in financial scenarios.”
- “While there are several definitions and operationalizations of trustworthiness – a significantly large part of the literature defines trustworthiness to consist of three key dimensions: ability (competence), benevolence, and integrity.”
AGI Is To Do As Humans Do
One highlighted lesson from that study is that perhaps the way to proceed is to consider shaping AGI to determine trust in a manner similar to how humans do so. In other words, rather than reinventing the wheel and trying to come up with a new means of assessing trust, let’s just have AGI abide by human means.
As noted, trust can be based on a variety of dimensions. Each of those dimensions can be quantified. AGI could lean into those dimensions and seek to gauge each user accordingly. This would be a continually running element that AGI would always keep underway.
Even this human-like approach has challenges.
For example, a new user logs into AGI for the very first time. AGI knows nothing about the user. How can any of the dimensions be adequately gauged when there is a paucity of available information about the person? This would be true of a human judging another human on trust, namely that when you first meet someone, you typically have scant clues as to what their trustworthiness should be.
Another potential complication involves someone getting caught in the trust doldrums. Perhaps AGI assesses the person and gives them a quite low trust score. At this juncture, the person is in the basement and might have little hope of climbing out. AGI might slowly incrementally upward adjust the trust metric for that person, meanwhile, they are being treated in a principally distrusted manner.
Shoe On The Other Foot
It is a bit of a shock to some that we need to worry about how AGI is going to decide to trust humans. Nearly all the attention on the overall topic of AI and trust has to do with humans being comforted on how to trust AI. There is a sizable base of research on this still-evolving issue, see my in-depth analysis at the link here.
In the case of AGI, deciding whether we should trust AGI is certainly a momentous consideration. If we are going to rely on AGI to aid us in our work and play, that’s a lot of trust being thrown toward a machine. We already know that present-day AI can encounter AI confabulations that are made-up and not grounded in real facts, commonly referred to as AI hallucinations, see my coverage at the link here.
Suppose AGI does the same. We will have perhaps 8 billion people using AGI and some percentage of the time, AGI will give responses that are oddish. People are likely to fundamentally assume that AGI is utterly trustworthy and go along with potentially bizarre recommendations emitted by AGI. This could include harmful indications that mislead people into endangering acts.
Turns out that we need to be concerned about the duality of trust, consisting of people trusting AGI along with how AGI will be devising the trustworthiness of humans. It’s quite a complex equation. We ought to get things settled before we arrive at AGI and otherwise get caught in a tangled web of convoluted trust and distrust.
Per the words of Charles H. Green: “It takes two to do the trust tango — the one who risks (the trustor) and the one who is trustworthy (the trustee); each must play their role.” This fully applies to the two-way street of trust between humanity and AGI.