lang icon English
Nov. 4, 2024, 8 p.m.
265

Limitations of Large Language Models: A Study on AI's Incomplete World Understanding

Large language models (LLMs) are capable of impressive feats, such as writing poetry and generating functional computer programs, primarily by predicting subsequent words in text. While these capabilities suggest that LLMs might learn general truths about the world, a new study challenges that assumption. Researchers found that a popular type of generative AI could provide highly accurate driving directions in New York City without actually having an accurate internal map of the area. When they altered the environment by closing streets and adding detours, the AI's performance significantly declined. The study indicated that the model's internal representation of New York City was flawed, featuring many nonexistent streets and illogical connections. This raises concerns that AI models can appear to perform well in specific contexts but may fail under slightly altered conditions. Senior author Ashesh Rambachan emphasized the importance of understanding whether LLMs grasp coherent models of the world if researchers aim to apply these tools in scientific fields. The research team included multiple collaborators and will present their findings at the Conference on Neural Information Processing Systems. The researchers concentrated on transformer models, the backbone of LLMs like GPT-4, which are trained on extensive language data to predict tokens.

The team developed two new metrics—sequence distinction and sequence compression—to evaluate whether a transformer had formed a sound world model. They tested these metrics on two deterministic problems: navigating New York City streets and playing Othello. Contrary to expectations, transformers trained on random sequences performed better in developing coherent world models, likely due to exposure to a broader range of possible moves during training. Despite generating accurate outputs, only one model showed coherence in Othello, and none effectively navigated New York under the new conditions. When the researchers simulated detours, the AI's accuracy dropped dramatically from nearly 100 percent to 67 percent with just a minor change. The generated maps depicted an unrealistic and overly complex version of New York City, underscoring that transformers can excel at tasks without understanding the underlying rules. Rambachan urges caution against assuming these models understand the world simply because they achieve impressive results. Future research will explore diverse problems with gradually understood rules and apply the new evaluation metrics to real-world scientific challenges. This study is supported by various grants, including those from Harvard and the National Science Foundation.



Brief news summary

A recent study highlights the limitations of large language models (LLMs), like transformers, despite their success in certain tasks such as navigation. For instance, while these models can navigate complex urban environments like New York City, they struggle with unexpected challenges, such as road closures, revealing shortcomings in their mapping abilities. This raises significant concerns about the reliability of generative AI in dynamic situations. The researchers introduced two metrics—"sequence distinction" and "sequence compression"—to assess LLM navigation performance in both urban contexts and the game Othello. Intriguingly, AIs employing random strategies outperformed those following optimal moves in accurately mapping their environments. This suggests that while transformers may excel in isolated tasks, they often lack a deep understanding of vital concepts. The study calls for a reassessment of expectations regarding LLM capabilities and underscores the need for further research to enhance their coherence in world modeling, particularly in scientific contexts.

Watch video about

Limitations of Large Language Models: A Study on AI's Incomplete World Understanding

Try our premium solution and start getting clients — at no cost to you

I'm your Content Creator.
Let’s make a post or video and publish it on any social media — ready?

Language

Hot news

Oct. 26, 2025, 6:23 a.m.

Google AI Mode

Google has recently launched AI Mode, a pioneering feature that integrates AI-generated content directly into search results, marking a major evolution in how users interact with search engines and potentially transforming online information retrieval and digital marketing strategies.

Oct. 26, 2025, 6:18 a.m.

DEYA SMM: AI-Powered Social Media Management Stud…

DEYA SMM is a cutting-edge studio transforming social media management through the integration of artificial intelligence technologies.

Oct. 26, 2025, 6:17 a.m.

Anthropic Finalizes Multibillion-Dollar Deal with…

Anthropic, a leading AI company founded in 2021 by former OpenAI leaders, has announced a landmark multibillion-dollar deal with Google to access up to one million Tensor Processing Units (TPUs), significantly boosting its computational power.

Oct. 26, 2025, 6:17 a.m.

'SCARY': Experts warn of AI video use after man t…

WEST PALM BEACH, Fla.

Oct. 26, 2025, 6:13 a.m.

DocketAI Named a Cool Vendor in the 2024 Gartner®…

DocketAI has recently been honored as a Cool Vendor in AI-Led Sales by Gartner, a renowned global research and advisory firm.

Oct. 26, 2025, 6:12 a.m.

AI Disruption Reshapes Advertising Industry and S…

Artificial intelligence (AI) is transforming advertising by greatly speeding up content creation and making expert knowledge more accessible to marketers.

Oct. 25, 2025, 2:41 p.m.

Anthropic Signs Deal with Google Cloud to Expand …

Google Cloud has announced a major partnership with Anthropic, a leading AI company, to expand the use of Google’s TPU (Tensor Processing Unit) chips for training upcoming versions of Anthropic’s Claude AI models.

All news

AI Company

Launch your AI-powered team to automate Marketing, Sales & Growth

and get clients on autopilot — from social media and search engines. No ads needed

Begin getting your first leads today