Topline
The most recent releases of cutting-edge AI tools from OpenAI and DeepSeek have produced even higher rates of hallucinations — false information created by false reasoning — than earlier models, confounding the companies and presenting challenges as the industry evolves.
Hallucinations are when AI bots produce fabricated information and present it as fact. (Photo … More
Key Facts
AI bots have always produced at least some hallucinations, which occur when the AI bot creates incorrect information based on the information it has access to, but OpenAI’s newest o3 and o4-mini models have hallucinated 30-50% of the time, according to company tests, for reasons that aren’t entirely clear.
OpenAI bills o3 as its most powerful model because it is a “reasoning” model, which takes more time to “think” by working out answers to questions through step-by-step reasoning, with the company also claiming the model can think visually and process images.
But it’s not just an OpenAI problem: Another recent tool, Chinese company DeepSeek’s R1 reasoning model, hallucinates much more than DeepSeek’s traditional AI models, according to independent tests by the AI research firm Vectara.
Though companies are not exactly sure why reasoning models hallucinate so much, the New York Times reported these models can hallucinate at each step throughout their advanced “thinking” processes, meaning there are even more chances for incorrect responses.
Researchers at Vectara acknowledged reasoning models seem to hallucinate more, but suggested the training behind these reasoning models, like R1, are to blame instead of the model’s advanced “thinking” process.
How Often Do Ai Models Hallucinate?
In OpenAI’s tests of its newest o3 and o4-mini reasoning models, the company found the o3 model hallucinated 33% of the time during its PersonQA tests, in which the bot is asked questions about public figures. When asked short fact-based questions in the company’s SimpleQA tests, OpenAI said o3 hallucinated 51% of the time. The o4-mini model fared even worse: It hallucinated 41% of the time during the PersonQA test and 79% of the time in the SimpleQA test, though OpenAI said its worse performance was expected as it is a smaller model designed to be faster. OpenAI’s latest update to ChatGPT, GPT-4.5, hallucinates less than its o3 and o4-mini models. The company said when GPT-4.5 was released in February the model has a hallucination rate of 37.1% for its SimpleQA test. Vectara’s independent tests, which ask chatbots to summarize news articles, found some newer reasoning models performed markedly worse than other models. OpenAI’s o3 scored a 6.8% hallucination rate on Vectara’s test, while DeepSeek’s R1 scored a 14.3%. The performance for DeepSeek’s R1 was markedly worse than other DeepSeek chatbots, like the DeepSeek-V2.5 model, which hallucinated 2.4% of the time. IBM’s Granite 3.2 model, which the company says comes with advanced reasoning capabilities, also scored worse than the company’s other models on Vectara’s test. The Granite 3.2 model, which comes in a more complex edition and a smaller edition, performed worse than earlier IBM models—the larger 8B version had a hallucination rate of 8.7%, according to Vectara, while the smaller 2B version hallucinated 16.5% of the time.
Why Do Ai Chat Bots Hallucinate?
AI models hallucinate because they are trained on a certain amount of data, and they are prompted to respond to queries with the most statistically likely answer. Questions asked outside of the data the AI model knows can lead to the bot responding with incorrect information, and their probability-based approach sometimes leads to the bot finding faulty patterns and creating fabricated information. AI hallucinations can be grammatically correct and are presented as fact, despite being incorrect. Incomplete or biased data sets or flaws in an AI model’s training can also contribute to AI hallucinations. Transluce, a nonprofit AI research firm, analyzed OpenAI’s o3 model and said another contributing factor to hallucinations may be that these models are designed to maximize the chance of giving an answer, meaning the bot will be more likely to give an incorrect response than admit it doesn’t know something.
What Have Ai Companies Said About Hallucinations?
OpenAI acknowledged o3’s hallucination rate in a research paper recapping internal tests on its models, stating o3’s tendency to make more definitive claims—as opposed to acknowledging it doesn’t know an answer—means it makes both more correct answers and more incorrect answers. The company admitted more research is needed to understand and fix the model’s hallucination issues. OpenAI CEO Sam Altman previously said hallucinating is more like a feature of AI instead of a bug, adding “a lot of value from these systems is heavily related to the fact that they do hallucinate.” Companies that develop AI products, including Google, Microsoft and Anthropic have all said they are working on fixes to hallucination issues. Microsoft and Google have both released products—Microsoft’s Correction and Google’s Vertex—that they say can flag information that may be incorrect in AI bot responses, though TechCrunch reported experts expressed doubt these will fully solve AI hallucinations.
How Are Researchers Trying To Stop Hallucinations?
Researchers largely say stopping AI bots from hallucinating is impossible, but many are working on various ways to reduce the rates of hallucinations. Some researchers have proposed teaching AI models uncertainty, or the ability to say, “I don’t know,” to avoid producing falsehoods, the Wall Street Journal reported. Other researchers are relying on “retrieval augmented generation,” a technique in which the AI bot retrieves documents relevant to the question to use as a reference, rather than answering immediately based on data stored in its memory.
Chief Critics
Some researchers have criticized the term “hallucination” because it may erroneously humanize AI models, as a human hallucination—in which a person perceives something that is not real—is not the same as an AI bot making up false information. Usama Fayyad, executive director of Northeastern University’s Institute for Experiential Artificial Intelligence, told Northeastern Global News the term “hallucination” attributes “too much to the model,” including intent and consciousness, which AI bots do not have.
Further Reading
A.I. Is Getting More Powerful, but Its Hallucinations Are Getting Worse (New York Times)
Why Do AI Chatbots Have Such a Hard Time Admitting ‘I Don’t Know’? (Wall Street Journal)
What are AI chatbots actually doing when they ‘hallucinate’? Here’s why experts don’t like the term (Northeastern Global News)
Source: https://www.forbes.com/sites/conormurray/2025/05/06/why-ai-hallucinations-are-worse-than-ever/