Via AI and LLMs we can devise new psychometrics to measure mental health across society in a manner that we couldn’t do previously.
getty
In today’s column, I examine the use of generative AI and large language models (LLMs) as psychometric instruments that can boost how we gauge the status of human mental health, especially when decisively used at scale.
Here’s the deal. Conventional means of measuring mental health on a large-scale basis tend to rely on relatively simplistic instruments such as static surveys that ask multiple-choice pre-limited questions. The beauty of generative AI is that you can have the AI perform a survey that is responsive in real-time and varies depending on what the respondent indicates.
Furthermore, the AI can employ a wide range of multi-modal interactions, including the use of written content, video, images, and vastly expand the data that can be collected from respondents. AI can also readily sense reactions and responses via a mixture of written replies, verbal utterances, facial expressions, and the like.
This is going to be a transformative method of ascertaining the mental health of society on a scale that was not previously viable.
Let’s talk about it.
This analysis of AI breakthroughs is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining various impactful AI complexities (see the link here).
AI And Mental Health Therapy
As a quick background, I’ve been extensively covering and analyzing a myriad of facets regarding the advent of modern-era AI that produces mental health advice and performs AI-driven therapy. This rising use of AI has principally been spurred by the evolving advances and widespread adoption of generative AI. For a quick summary of some of my posted columns on this evolving topic, see the link here, which briefly recaps about forty of the over one hundred column postings that I’ve made on the subject.
There is little doubt that this is a rapidly developing field and that there are tremendous upsides to be had, but at the same time, regrettably, hidden risks and outright gotchas come into these endeavors too. I frequently speak up about these pressing matters, including in an appearance last year on an episode of CBS’s 60 Minutes, see the link here.
People Are Using AI For Mental Health Advice
The most popular use nowadays of the major LLMs is for getting mental health guidance, see my discussion at the link here. This occurs easily and can be undertaken quite simply, at a low cost or even for free, anywhere and 24/7. A person merely logs into the AI and engages in a dialogue led by the AI.
There are sobering worries that AI can readily go off the rails or otherwise dispense unsuitable or even egregiously inappropriate mental health advice. Huge banner headlines in August of this year accompanied a lawsuit filed against OpenAI for their lack of AI safeguards when it came to providing cognitive advisement. Despite claims by AI makers that they are gradually instituting AI safeguards, there are still a lot of downside risks of the AI doing untoward acts, such as insidiously helping users in co-creating delusions that can lead to self-harm.
For the details of the OpenAI lawsuit and how AI can foster delusional thinking in humans, see my analysis at the link here. I have been earnestly predicting that eventually all of the major AI makers will be taken to the woodshed for their paucity of robust AI safeguards. Lawsuits aplenty are arising. In addition, new laws about AI in mental healthcare are being enacted (see, for example, my explanation of the Illinois law, at the link here, the Nevada law at the link here, and the Utah law at the link here).
Using AI As A Psychometric Instrument
Millions upon millions of people are regularly using AI as a mental health advisor or for casual mental health guidance, which is reflected in a recent report by OpenAI about the usage of ChatGPT and GPT-5; see my analysis at the link here.
This provides a remarkable opportunity to measure the mental health status of the population as a whole. AI can far surpass the conventional approach to gauging mental health on a local, national, and global scale. Some studies are examining the already collected data that people enter into AI about their mental health, see my discussion at the link here, though this is limited due to privacy concerns and other access-related aspects.
An alternative would be to leverage AI as a form of outreach to do a pulse check of mental health status. The AI would be upfront and inform people that their responses are going to be used for proper analyses about collective mental health conditions. The cost to perform this kind of polling would be relatively inexpensive and easily done on a mass basis.
Advantages Of LLMs
Most survey instruments involving psychometrics are usually static, providing a pre-determined set of questions and a pre-determined set of available answers to choose from. That limits flexibility and forces respondents into particular buckets and narrow choices.
There are more advanced survey tools that allow a decision tree approach to asking questions. These tools will branch here or there, depending upon the answers provided by the respondent. Still, a lot of effort is required to set up those surveys, and there are only so many paths that reasonably could be devised beforehand.
Using LLMs as a survey tool opens lots of new possibilities.
The AI can discuss the questions with the respondents. Interaction about the answers of a respondent can be pursued in fluent, natural language terms. The paths and direction of the survey can vary in a wide variety of ways. Multi-modal interaction can be leaned into.
Not only can multiple modes of interaction be used to present information and ask questions, but the collection of responses can be done beyond written responses alone. Verbal and visual cues are often significant telltale signs that are otherwise not obtained during a survey. How a respondent reacts can be captured and analyzed by the AI. Tone and facial expression garner insights as much as actual provided answers. All of this occurs in real-time.
Overall, researchers and those studying the widespread status of mental health can tap into AI by using straightforward prompting methods and do not necessarily need to learn a new programming language or specialized forms-based approaches to devising their surveys.
Psychometrics Are Changing
I have a thoughtful question for you to ponder.
Should we reconsider the nature of psychometrics because of opening a new world to the collection and analysis of mental health status?
The answer would seem to be a resounding yes.
Here’s why. Our ability and means of enacting psychometrics to any practical degree had essentially constrained us to what was logically feasible. If we can do a lot more regarding psychometrics, it makes abundant sense to rethink the prevailing psychometrics that have traditionally existed. The boundaries are now expanded. A mindful redesign due to AI is handily in our grasp.
This reminds me of a famous story about the history of Boston roadways. Allow me for a moment to share the saga. The early history of Boston consisted of bringing cows from outside of Boston into the city. Those herding the cows would allow the animals to make their way as arbitrarily desired by their beastly minds. This ultimately led to all manner of crisscrossing dirt paths that resembled the most byzantine of mazes.
The kicker is that once cars came along and roads needed to be paved, the same cow paths were used. It was easiest to merely affix asphalt onto the dirt paths. This is problematic because driving in Boston is known to be a bit of a nightmare as a result of roads that twist this way and that way. One-way streets are common. Roads that intersect bizarrely are everywhere. Etc.
Perhaps a better approach would have been to take another look at how the roadways ought to be designed. The rule-of-thumb is that paving over cow paths is not especially a productive way to proceed. Do prudent rethinking and redesign before continuing the paths of history.
Redesigning Psychometrics
Fortunately, a recent research study opted to address the redesign of psychometrics in an era of generative AI and LLMs.
In a study entitled “Generative Psychometrics — An Emerging Frontier in Mental Health Measurement” by Isaac R. Galatzer-Levy, Nenad Tomasev, Sophie Chung, Gwydion Williams, JAMA Psychiatry, November 19, 2025, these salient points were made (excerpts):
- “By using generative models to interact with individuals, we can capture rich information that was traditionally accessible only through open-ended interviews and observations.”
- “This represents a major evolution for psychometrics. The field can move beyond methods that require people to behave like computers, forcing them to describe their experiences using machine-readable formats, like numeric scales.”
- “Instead, this approach allows human experience to be expressed and understood in its own natural, nuanced terms across diverse visual, auditory, and linguistic modalities.”
- “For prediction, GenAI can holistically synthesize disparate data signals. A model could, for example, correlate shifts in language from a patient’s journal with biometric changes from a wearable device and flag these against their clinical history.”
- “Perhaps more transformatively, GenAI can restore nuance to assessment at scale. By moving beyond fixed-item questionnaires to interactive, conversational inquiry, these models can probe an experience in a patient’s own words, personalizing the assessment.”
A key insight is that we should materially consider redesigning prevailing psychometrics, doing so with mindful purpose and noteworthy aims. This raises an important point. Some might be tempted to toss together new psychometrics willy-nilly. There is a chance that we could end up with a mess on our hands.
Four Thoughtful Stages
To try and ensure that the redesign of psychometrics is undertaken systematically, the researchers propose a four-stage approach:
- Stage 1: “Foundational Integration — From Construct to Computational Task”
- Stage 2: “Hybrid Development — Prompt Engineering as Theory-Driven Item Generation”
- Stage 3: “A Unified Validation Framework”
- Stage 4: “From Measurement Invariance to Algorithmic Equity”
In brief, the first stage encourages the formulation of potentially vague measures into something more concrete. The second stage involves devising prompts that can be used in an LLM to get the AI to make use of the formulated psychometric. The third stage requires validating the psychometric since otherwise it might be misleading, misreported, etc. The fourth stage entails testing and refinement to ensure that the psychometric is balanced and fair across the board.
Example Of The Process
I’ll share with you an illustrative example that comes to my mind.
Suppose we intend to assess social anxiety as a mental health condition. You could just tell an LLM to go ahead and come up with questions about social anxiety and allow the AI to freewheel accordingly. What you might then get from the AI is likely going to be enormously inconsistent and not conducive to rigorous analysis.
Aha, let’s dip into the four stages.
The first stage says I should come up with a construct that can be used as a computational task. Okay, I’ll do that. Upon making use of DSM-5, see my coverage of AI usage and the DSM-5 acclaimed guidebook on mental health at the link here, I came up with this psychometric stipulation:
- Psychometric bounding for social anxiety — “Identify and grade narrative indicators of fear-of-evaluation and social avoidance in daily interactions.”
This aligns with clinical theory and DSM-5 features.
More On The Process
In the second stage, I provided a prompt to various LLMs, such as ChatGPT, Claude, Grok, and Gemini, and informed the AIs about my intention. I wanted to see what each one would say. By and large, the results were pretty much the same, and all seemed on target.
I invoked ChatGPT to walk me through a sample dialogue, as though a survey was being performed. I opted to pretend that I have a semblance of social anxiety.
- My prompt: “I’m ready to take the mental health status survey.”
- ChatGPT response: “Think about a recent social interaction. What thoughts or feelings did you experience before, during, and after it?”
- My prompt: “I spoke with coworkers at the office yesterday about my upcoming vacation. I felt hesitant beforehand. During the discussion, I was somewhat nervous, and afterward I second-guessed myself about what I had said.”
- ChatGPT response: “What were you worried others might think of you?”
- My prompt: “Now that I think about it, yes, I was.”
- ChatGPT response: “Have you been skipping certain interactions lately? What made you hesitant?”
You can see that the AI was pursuing the social anxiety angle. I received a question shown above that asked about potential fear-of-evaluation, and a follow-up question about avoidance.
There is no viable way that a conventional survey instrument could be that flexible, opting in real-time to adjust and respond fluently and suitably to my answers.
Completing The Process
The third stage would entail using the LLM to do validation of the newly formed psychometric.
Here are some actions that might be taken:
- Respondents undertake a similar chat within one week, and a score consistency is assessed (test–retest reliability).
- Therapists independently score the AI-indicated social anxiety severity based on the recorded chats. Their scores are compared to the AI scoring (interrater reliability).
- Psychologists specializing in social anxiety perform an in-depth review of the LLM prompts to discern if content validity is being attained (looking for cognitive distortions, safety behaviors, avoidance patterns).
- And so on.
For the fourth stage, performance could be disaggregated by demographic groups, and various kinds of analyses might be performed:
- Use the same AI or perhaps a different AI to examine the recorded digital dialogues and computationally detect if the AI that was conducting the survey misrepresented or improperly responded to respondents based on communication styles that favor indirectness or high-context conversational norms.
- Conduct fairness audits to assess responsiveness.
- And so on.
A New Ballgame Is Afoot
For those opting to use AI in this manner as a survey instrument of incredible possibilities, substantive attention would need to be devoted to coping with potential AI hallucinations and other AI-laden issues (see my coverage at the link here). Suppose the LLM goes overboard during a survey and begins providing overt mental health advice, rather than solely collecting data? What if the AI gives out harmful guidance during this process? Suitable controls and monitoring must be incorporated into this AI usage.
A final thought for now.
Lord Kelvin, the famous mathematician and engineer, made this insightful point: “If you cannot measure it, you cannot improve it.
The advent of readily available AI at a low cost and accessible anytime and anywhere provides tremendous promise for measuring societal mental health. We must rethink the prevailing psychometrics and consider new measures that leverage the capabilities of LLMs. This requires systematically doing so, rather than coming up with wild fly-by-night metrics.
We have in our midst state-of-the-art ways to at-scale ascertain mental health and use that newfound knowledge to hopefully aid and boost mental health throughout humankind. These are improvements of immense value and deserve vital attention.
We can measure it and improve it.