AI chatbots from leading tech firms like OpenAI and Google have been receiving reasoning improvements in recent months to enhance answer reliability. However, recent tests reveal that some newer models perform worse than earlier versions, exhibiting a phenomenon called "hallucinations"—errors where chatbots generate false information or provide answers that are factually correct but irrelevant or noncompliant with instructions. This issue has persisted since the inception of large language models (LLMs) like OpenAI’s ChatGPT and Google’s Gemini, and it appears unlikely to be fully resolved. An OpenAI technical report showed that its April-released o3 and o4-mini models had significantly higher hallucination rates than the older o1 model from late 2024: o3 had a 33% hallucination rate, o4-mini 48%, compared to 16% for o1, when summarizing publicly available facts. Similarly, Vectara’s leaderboard tracking hallucination rates found that some reasoning models—including DeepSeek-R1—experienced notable increases in hallucinations compared to predecessors, despite their multi-step reasoning approach before answering. OpenAI maintains that reasoning processes are not inherently responsible for the hallucination rise and is actively researching ways to reduce hallucinations in all models. The persistence of hallucinations threatens several applications: models that frequently produce falsehoods hinder research assistance; paralegal bots citing nonexistent cases risk legal errors; customer service bots with outdated info cause operational issues. Initially, AI companies expected hallucinations to diminish over time, as early model updates showed improvements. Yet, recent higher hallucination levels challenge this outlook, regardless of reasoning involvement. Vectara’s leaderboard indicates hallucination rates are roughly equal in reasoning versus non-reasoning models from OpenAI and Google, though exact numbers matter less than relative rankings.
Google declined to comment. However, such rankings have limitations. They blend different hallucination types; for instance, DeepSeek-R1’s 14. 3% hallucination rate mainly comprised “benign” cases—answers logically sound and supported by knowledge but absent from the source text. Moreover, testing based solely on text summarization may not reflect hallucination frequencies in other tasks, as LLMs are not designed specifically for summarizing. Emily Bender of the University of Washington emphasizes that these models predict likely next words rather than process information to truly understand text, making the term "hallucination" both misleading and anthropomorphizing. Bender critiques "hallucination" as problematic because it implies errors are aberrations in otherwise reliable systems and attributes human-like perception to AI, which does not "perceive" in any sense. Princeton’s Arvind Narayanan adds that models also err by relying on unreliable or outdated data, and simply adding training data or compute power has not solved these problems. Consequently, error-prone AI may be an enduring reality. Narayanan suggests using such models only when fact-checking is quicker than doing original research, while Bender recommends avoiding reliance on AI chatbots for factual information altogether.
AI Chatbots Face Persistent Hallucination Issues Impacting Reliability
                  
        Amazon reported third-quarter net sales of $180.2 billion, marking a 13 percent increase compared to the previous year, driven largely by artificial intelligence initiatives throughout its Seattle-based operations.
        Last summer at the Paris Olympics, Mack McConnell realized that search had fundamentally changed when his parents independently used ChatGPT to plan their day, with the AI recommending specific tour companies, restaurants, and attractions—businesses gaining unprecedented visibility.
        The integration of Artificial Intelligence (AI) in social media marketing (SMM) is swiftly reshaping digital advertising and user engagement, driven by advances in computer vision, natural language processing (NLP), and predictive analytics.
        Meta Platforms Inc.
        In recent years, artificial intelligence (AI) has revolutionized marketing, enabling major companies to optimize strategies and achieve impressive returns on investment.
        HIMSS' Rob Havasy and PMI's Karla Eidem emphasize that healthcare organizations need to establish well-defined goals and robust data governance before developing AI tools.
        Wix, a leading website creation and management platform, has launched an innovative feature called the AI Visibility Overview, designed to help website owners better understand their sites’ presence within AI-generated search results.
Launch your AI-powered team to automate Marketing, Sales & Growth
    and get clients on autopilot — from social media and search engines. No ads needed
Begin getting your first leads today