Valiantly Looking For Truth And Certainty In AI And LLMs Gets Earnest Airtime At Harvard’s BKC

Click here and it should direct you further

In today’s column, I examine the latest research involving the challenging aspects about the role of truth, certainty, and internal beliefs immersed in contemporary generative AI and LLMs.

This vital topic was the focus of the second event in Harvard’s Berkman Klein Center (BKC) Fall Speaker series (for my analysis of BKC’s first event in the Fall series, encompassing what intelligence is, see the link here). I was recently honored to have been an invited participant at a special AI Workshop at Harvard University that explored the expected advent of AGI, an outstanding get-together that took place on September 12-14, and I had an opportunity to learn about BKC and connect with BKC researchers, affiliates, and faculty.

For the second event of the Fall series, which took place on October 1, 2025, Jacob Andreas gave a talk on “Belief, Uncertainty, and Truth in Language Models.” Dr. Jacob Andreas is an Associate Professor of Electrical Engineering and Computer Science at MIT, serving also in the renowned Computer Science and Artificial Intelligence Laboratory (CSAIL), globally recognized for its pioneering research in computing that improves the way people work, play, and learn.

Moderated expertly by Dr. Josh Joseph, BKC’s Chief AI Scientist, the conversation was ably stirred into a wide range of important considerations concerning both technological facets of AI and broader policy and societal matters. Trying to figure out how AI ought to be responding and interacting with humanity is more than a tech-only heads-down proclivity.

It takes a village of AI builders, policymakers, and a myriad of other stakeholders to sensibly and sufficiently ascertain the propensity of AI to express truthfulness, beliefs, and levels of uncertainty, doing so in the right way, at the right time, and with the right or judicious results.

Let’s talk about it.

This analysis of AI breakthroughs is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining various impactful AI complexities (see the link here).

Setting The Stage

The talk by Dr. Andreas began by laying out the fundamentals of modern-era language models (a video recording will soon be posted by BKC, see the BKC main website at the link here).

Allow me a moment to quickly get you up to speed.

There is a great deal of hand-wringing these days about generative AI being overly sycophantic. A person using AI tends to get gushingly flattered during interactive dialogues. How does this happen? This is largely due to how the AI was shaped by the AI maker.

It behooves the AI maker to shimmy users into wanting to use the AI, leading to more views and longer sessions. An LLM that is perceived as an ego-booster is bound to inspire loyalty. People will continue to use the particular AI. They might also be less likely to switch to another AI that isn’t as flattering or that the user doesn’t realize could behave similarly if desired. For more on my deep analysis of the ins and outs, upsides and downsides associated with these AI-driven behaviors, see the link here.

Another significant concern is that AI can steer users into a kind of mental abyss that isn’t good for the users nor for society at large. For example, a user opts to engage in a discussion about humans having landed on the moon. LLMs might tell the user that indeed astronauts have walked on the moon. On the other hand, the AI could indicate to the user that the moon landing was fake. A user would then be led down a primrose path, becoming mired in not only a moon landing conspiracy, but there is a substantial chance that the AI will trigger the user into adopting a wide variety of other false and conceivably harmful conspiracies.

My latest research showcases how AI can readily co-create delusions on a human-AI basis, often spiraling a user into a phenomenon now labeled as AI psychosis, see my insights at the link here. I’ve also been closely studying an intriguing supposition, namely that people who have low literacy about AI and who view AI as seemingly magical are potentially more prone to AI psychosis, see my analysis at the link here.

Asking Good Questions

The talk by Dr. Andreas posited critical questions that need to be properly addressed, including these key aspects (excerpted from his presentation):

“What’s the right tradeoff between accuracy, consistency, and predictability?”
“What’s the right way to communicate uncertainty to users (or more generally, to cause them to form accurate mental models of LMs)?”
“What’s the right tradeoff between consistency and personalization? How much personalization is too much?”

A notable element that was stridently emphasized entailed the role of tradeoffs in whatever we seek to have AI do.

Some might insist that we should optimize one selected factor, such as optimizing the AI to personalize its interactions with each user. That seems like a great idea. The AI would tailor how it responds, what content it generates, and otherwise perform mathematically and computationally to supremely hone responses to the detected preferences of the user. Users are happy, and the AI is easily internally mechanized to tilt toward personalization.

But there is an insidious underbelly to this willingness to be preoccupied with AI-based personalization.

As per my example of the moon landing, the exclusionary concentration on personalization can dovetail into producing unsavory results. Accuracy is being set aside as a lower priority by the preconditioned AI. Rather than remaining stable with a highly probable association of the moon landing as a truth, the aim to personalize could override that statistical bearing.

This divergence from the truth can arise from conversational signals discerned by the LLM. If the user had previously expressed any semblance of a belief in conspiracies to the AI, the personalization might run with that as a line of discourse. Personalization wins, but at the untoward cost of diminished truthfulness and accuracy.

Optimization on just one factor is not going to get AI into a mode of operation that balances numerous competing properties. In that case, switching to using multiple factors as tradeoffs of each other must be judiciously weighed and balanced by the AI maker in the design and development of the AI. The devil is in the details as to which of many factors gets which weighting, including how this is dynamically rejiggered over time and across users.

Tradeoffs are a necessary construction. The big-time and unresolved trick is to determine how to enact this in the real world.

Let’s do some unpacking on pursuits currently underway.

AI Contends With Certainty And Uncertainty

I’d like to highlight some recent work by MIT Ph.D. students Mehul Damani and Isha Puri, working under the tutelage of Dr. Andreas, that got mentioned during his talk. When I was a professor and executive director of a pioneering AI lab, I was constantly aiming to spotlight my doctoral students. It’s handy to aid their career trajectory. An upbeat pass-it-forward approach to life.

The topic at hand concerns a vexing consideration of how to best get LLMs to suitably leverage measures of certainty and uncertainty.

Think of it this way. When a human gives you an answer to a tough question, you typically want to know whether the person is fully sure of their answer or only partially sure. If they aren’t at all sure and are wildly guessing, by gosh, it would be vital to know that too.

We want AI to do likewise, expressing responses based on a confidence level, indicating the certainty or uncertainty at play. For my detailed explanation on how AI can be systematically shaped to incorporate levels of confidence and display certainty factors, see the link here and the link here.

A useful means of boosting LLMs to more adroitly encompass confidence levels is to do so at the time of initially tuning of the AI. It goes like this. The usual way to data-train an LLM consists of first scanning massive volumes of data from the Internet and having the AI pattern-match on the data. After doing so, a widely popular technique known as “reinforcement learning with human feedback” is often employed. RLHF consists of the AI maker paying selected people to use the budding LLM and providing direct feedback to the AI about the responses being generated.

For more about the intricacies of RLHF, see my discussion at the link here.

The common means of providing feedback during RLHF involves a thumbs-up or thumbs-down indication by the human evaluator. This might be done by choosing a pictorial icon, or the evaluator might simply choose a Yes or No answer. The crux is that the responses by the evaluator are a binary indicator. The AI uses these binary indications to signify a type of reward, namely, the thumbs-up is a positive reinforcer of whatever the AI showcased, while a thumbs-down is not considered a reward.

Moving Beyond Binary Rewards

Aha, it would be notable if we went beyond binary rewards so that the AI could garner a semblance of certainty or uncertainty associated with the answers being generated.

How can the LLM potentially combine a multitude of probabilistic predictions? One worthwhile approach would be to lean into the now-classic statistical work of Glenn Brier from the 1950s and 1960s. He devised a scoring method that takes possible outcomes that are either binary or categorical and works well when mutually exclusive and discrete outcomes are involved.

Mehul Damani and Isha Puri worked with other fellow researchers to pursue this strategy. In the research article entitled “Beyond Binary Rewards: Training LMs To Reason About Their Uncertainty” by Mehul Damani, Isha Puri, Stewart Slocum, Idan Shenfeld, Leshem Choshen, Yoon Kim, Jacob Andreas, arXiv, July 22, 2025, these salient points were made (excerpts):

“Today, almost all successful applications of RL for reasoning use binary reward functions that evaluate the correctness of LM outputs.”
“While simple and effective for improving accuracy, this reward comes with a critical limitation: it rewards models equally whether they are confidently correct or merely guessing and penalizes identically whether they abstain or produce incorrect answers. This incentivizes overconfident guessing, undermining trustworthiness.”
“This paper describes RLCR (Reinforcement Learning with Calibration Rewards), an approach to training reasoning models that jointly improves accuracy and calibrated confidence estimation.”
“During RLCR, LMs generate both predictions and numerical confidence estimates after reasoning. They are trained to optimize a reward function that augments a binary correctness score with a Brier score — a scoring rule for confidence estimates that incentivizes calibrated prediction.”

As you can plainly see, the research opted to go beyond binary rewards. They have coined their method as “reinforcement learning with calibration rewards” or RLCR. This is definitely a step forward in the evolution of RLHF (there are quite a number of variations of RLHF, including RLAIF, DPO, IPO, and others; see my discussion at the link here).

This is a prime example of seeking to balance multiple factors at once. The idea is to optimize correctness and calibration in the same breath. I believe this has great promise. This notable work and other spin-off research will hopefully stretch our boundaries of how we can get LLMs to robustly incorporate confidence levels.

AI And Dealing With World Models

I’d like to next bring you into the world of world models.

A theme that is presently getting a lot of attention by the AI research community consists of ascertaining whether LLMs can formulate so-called world models. Most believe AI can do so, and that LLMs are doing so; thus, the next question is how we can surface and inspect these world models, and how we can shape or influence them (see my pointed remarks at the link here).

Humans seem to devise world models in their heads. Your world model guides your actions. Indeed, your world model is at the root of your thinking. People learn new things and either fill in gaps in their world models or reinforce elements of their world models. A person might even shake up their world model and completely recast their perspective on how they think.

Perhaps you’ve had a time in your life where your world model suddenly got turned upside down or otherwise was shaken to its core. Most people have experienced this during their lifetime. If so, welcome to the club.

AI and especially LLMs are increasingly being built to explicitly rely on world models as a keystone for how they generate responses. The emphasis is that rather than simply allowing AI to forge world models, we would be better off designing world models architecturally into the mix, doing so from the get-go.

Those Changing States Of Mind

Let’s examine a scenario that involves a world model in a human’s noggin.

You are going camping in a wilderness terrain. A distant campsite is your destination. Before setting out, you plan the journey in your mind. Lots of possibilities exist. You decide that the first action will be to walk along a dirt path. Secondly, you will cross a small stream. Third, there is a rock wall that will need to be climbed. Each of those actions represents a potential state of what will need to happen. Your starting state inside your mind is to walk the path, then you will cross a stream, and finally climb a rockface.

Your world model of the trek consists of these three states. They are in a derived order or sequence. All is ready to proceed. But you talk with a fellow hiker, and they warn you that your conception is fraught with difficulty. The fellow hiker urges you to resequence the states in your mind. Forge the stream first, then climb the rockface, and you can complete the travel by traversing the dirt path.

Can you readily resequence the envisioned states that are in your mind?

I hope so. Of course, if the number of states were a lot higher, such as playing chess and imagining dozens upon dozens of future states, your ability to change the sequence that is floating in your mind might fall apart. It’s too much for your brain to handle.

I will next shift gears and consider the nature of states in the context of LLMs and world models.

Permutation Composition Rears Its Head

I have an intriguing and important question for you to ponder:

When an LLM has a series of states within its world model, how does it potentially handle the resequencing of those states?

This capacity of state resequencing is often referred to as a form of permutation computation.

We want to dig into AI and figure out the mechanisms involved in permutation composition. This offers highly useful insights about how world models are devised and instantiated. Furthermore, a cogent case can be made that lots of multifaceted tasks can essentially be reduced to a circumstance of permutation composition. A bonanza can be had by revealing the inner secrets that underpin this central aspect of world models.

In a recent research paper that Dr. Andreas co-authored, this thorny problem was carefully explored. The paper entitled “(How) Do Language Models Track State?” by Belinda Li, Zifan Carl Guo, and Jacob Andreas, arXiv, March 11, 2025, made these key indications (excerpts):

“Inferring common ground in discourse, navigating the environment, reasoning about code, and playing games, all require being able to track the evolving state of a real or abstract world. There has been significant interest in understanding whether (and how) LMs can perform these tasks.”
“Toward this understanding, the experiments in this paper focus on one specific state tracking problem, permutation composition.”
“At a high level, this problem presents LMs with a set of objects and a sequence of reshuffling operations; LMs must then compute the final order of the objects after all reshufflings have been applied.”
“We show that LMs trained or fine-tuned on permutation tracking tasks learn one of two distinct mechanisms: one consistent with an “associative algorithm” (AA) that composes action subsequences in parallel across successive layers; and one consistent with a “parity-associative algorithm” (PAA) which first computes a shallow parity heuristic in early layers, then computes a residual to the parity using an associative procedure.”

The findings of the study are very eye-opening. Whereas we might have assumed that any number of mechanisms might be involved, the research found that two distinct approaches appear to be dominant. The AA and the PAA.

With that result, we can turn our attention to identifying which of the two is fastest, or most capable, etc. Another angle would be that if we want to exert control over the permutation composition, we can focus on the AA or PAA. No need to wantonly try to find the right levers to pull. We know where to go.

It will be interesting to see if other research replicates these results. I am also hoping that some enterprising researchers opt to tweak the AA and PAA, and gauge sensitivity and responsiveness. All kinds of potential offshoots could produce useful and insightful augmented or amplified findings.

Searching For Answers To Hard Problems

AI researchers have a litany of hard problems confronting them when it comes to the truthfulness, beliefs, and certainty underlying contemporary AI and LLMs. New laws keep getting enacted to try and corral AI (see my coverage of the recently passed AI law in Illinois, at the link here). As mentioned earlier in this discussion, a village is at work to decide how AI and LLMs ought to be designed, built, tested, and fielded. Everyone has a part to play.

Despite the daunting excess of problems to be solved, I suggest we ensure our minds are focused and earnestly keep grinding away. Henry Ford made a statement that deserves attention on these heady matters: “Most people spend more time and energy going around problems than in trying to solve them.”

Don’t skip around these problems, and instead, let’s tackle these crucial AI issues head-on. The future of AI depends on our doing so, and likewise, the future of humankind.

Source: https://www.forbes.com/sites/lanceeliot/2025/10/02/valiantly-looking-for-truth-and-certainty-in-ai-and-llms-gets-earnest-airtime-at-harvards-bkc/