Auto-Filling SEO Website as a Gift

Launch Your AI-Powered Business and get clients!

No advertising investment needed—just results. AI finds, negotiates, and closes deals automatically

May 10, 2025, 5:50 p.m.
122

AI Chatbots Face Persistent Hallucination Issues Impacting Reliability

AI chatbots from leading tech firms like OpenAI and Google have been receiving reasoning improvements in recent months to enhance answer reliability. However, recent tests reveal that some newer models perform worse than earlier versions, exhibiting a phenomenon called "hallucinations"—errors where chatbots generate false information or provide answers that are factually correct but irrelevant or noncompliant with instructions. This issue has persisted since the inception of large language models (LLMs) like OpenAI’s ChatGPT and Google’s Gemini, and it appears unlikely to be fully resolved. An OpenAI technical report showed that its April-released o3 and o4-mini models had significantly higher hallucination rates than the older o1 model from late 2024: o3 had a 33% hallucination rate, o4-mini 48%, compared to 16% for o1, when summarizing publicly available facts. Similarly, Vectara’s leaderboard tracking hallucination rates found that some reasoning models—including DeepSeek-R1—experienced notable increases in hallucinations compared to predecessors, despite their multi-step reasoning approach before answering. OpenAI maintains that reasoning processes are not inherently responsible for the hallucination rise and is actively researching ways to reduce hallucinations in all models. The persistence of hallucinations threatens several applications: models that frequently produce falsehoods hinder research assistance; paralegal bots citing nonexistent cases risk legal errors; customer service bots with outdated info cause operational issues. Initially, AI companies expected hallucinations to diminish over time, as early model updates showed improvements. Yet, recent higher hallucination levels challenge this outlook, regardless of reasoning involvement. Vectara’s leaderboard indicates hallucination rates are roughly equal in reasoning versus non-reasoning models from OpenAI and Google, though exact numbers matter less than relative rankings.

Google declined to comment. However, such rankings have limitations. They blend different hallucination types; for instance, DeepSeek-R1’s 14. 3% hallucination rate mainly comprised “benign” cases—answers logically sound and supported by knowledge but absent from the source text. Moreover, testing based solely on text summarization may not reflect hallucination frequencies in other tasks, as LLMs are not designed specifically for summarizing. Emily Bender of the University of Washington emphasizes that these models predict likely next words rather than process information to truly understand text, making the term "hallucination" both misleading and anthropomorphizing. Bender critiques "hallucination" as problematic because it implies errors are aberrations in otherwise reliable systems and attributes human-like perception to AI, which does not "perceive" in any sense. Princeton’s Arvind Narayanan adds that models also err by relying on unreliable or outdated data, and simply adding training data or compute power has not solved these problems. Consequently, error-prone AI may be an enduring reality. Narayanan suggests using such models only when fact-checking is quicker than doing original research, while Bender recommends avoiding reliance on AI chatbots for factual information altogether.



Brief news summary

Recent advancements in AI chatbots by companies like OpenAI and Google, focused on improving reasoning and accuracy, have paradoxically resulted in increased hallucination rates—instances where models generate false or misleading information and fail to adhere properly to instructions. For example, OpenAI’s newer o3 and o4-mini models exhibit hallucination rates of 33% and 48%, compared to 16% for the older o1 model, with similar trends noted in models like DeepSeek-R1. Despite these challenges, OpenAI asserts that reasoning components are not to blame and continues to work on reducing hallucinations. This issue is particularly critical in fields such as research, legal advice, and customer service, where inaccuracies can have serious consequences. Evaluations by Vectara reveal minimal differences in hallucination frequencies between reasoning and non-reasoning models, though the data remains limited. Experts warn that “hallucination” oversimplifies complex problems involving dependence on outdated or unreliable data. Given persistent inaccuracies, some suggest limiting AI chatbot use to scenarios where verifying information is simpler than independent fact-checking. Overall, hallucinations remain a major unresolved problem in AI language models.
Business on autopilot

AI-powered Lead Generation in Social Media
and Search Engines

Let AI take control and automatically generate leads for you!

I'm your Content Manager, ready to handle your first test assignment

Language

Learn how AI can help your business.
Let’s talk!

Hot news

June 25, 2025, 2:38 p.m.

U.S. Lawmakers Introduce Bill to Ban Chinese AI i…

A bipartisan group of U.S. lawmakers has introduced landmark legislation called the No Adversarial AI Act, aiming to ban Chinese artificial intelligence (AI) systems from use within the federal government.

June 25, 2025, 2:21 p.m.

Digital Asset, Builder of Privacy-Focused Blockch…

Digital Asset, the developer behind the privacy-centric blockchain Canton Network, announced on Tuesday that it has secured $135 million in a strategic funding round led by DRW Venture Capital and Tradeweb Markets.

June 25, 2025, 10:30 a.m.

JPMorgan Launches JPMD Deposit Token for Institut…

JPMorgan has introduced JPMD, a new digital asset tailored for institutional clients to execute secure on-chain payments.

June 25, 2025, 10:27 a.m.

OpenAI Reports China's Zhipu AI Gaining Ground Am…

Chinese AI start-up Zhipu AI has made significant strides in securing government contracts across regions such as Malaysia, Singapore, the United Arab Emirates, Saudi Arabia, and Kenya, according to OpenAI reports.

June 25, 2025, 6:19 a.m.

U.S. States Intensify Regulation of Cryptocurrenc…

Across the United States, states are intensifying efforts to regulate cryptocurrency ATMs amid a sharp rise in fraud cases, especially those targeting senior citizens.

June 25, 2025, 6:13 a.m.

AI Tools Enhance Teaching Efficiency and Educator…

Artificial intelligence (AI) tools are swiftly reshaping the educational landscape in the United States, providing teachers with new opportunities to boost the efficiency of their teaching methods and improve their work-life balance.

June 24, 2025, 2:43 p.m.

U.S. Congress Nears Passage of Stablecoin Regulat…

After multiple efforts over the years, the United States Congress is now close to enacting a comprehensive regulatory framework specifically for stablecoins.

All news