lang icon English
Auto-Filling SEO Website as a Gift

Launch Your AI-Powered Business and get clients!

No advertising investment needed—just results. AI finds, negotiates, and closes deals automatically

Aug. 2, 2023, 2:46 a.m.
243

None

In March, OpenAI released GPT-4, a large language model that was remarkably skilled at identifying prime numbers. It accurately labeled 97. 6 percent of a series of 500 prime numbers provided to it. However, the model's performance took a drastic turn in June. During the same test, GPT-4 only correctly labeled 2. 4 percent of the prime numbers. This change emphasizes the complexity of large AI models, which do not consistently improve in every task. Instead, their progress resembles a winding road with obstacles and diversions. The dramatic shift in GPT-4's performance was detailed in a preprint study conducted by three computer scientists: two from Stanford University and one from the University of California, Berkeley. The researchers compared GPT-4 to its predecessor, GPT-3. 5, in tests conducted in March and June, revealing numerous differences between the two models and variations in their output over time. Notably, in June, GPT-4's responses were less verbose compared to March, and the model appeared less inclined to provide explanations. It also acquired new quirks, such as appending accurate but potentially disruptive descriptions to sections of computer code. On the positive side, GPT-4 became more cautious, filtering out more offensive responses and displaying a reduced inclination to offer illegal or discriminatory suggestions. It also showed a slight improvement in solving visual reasoning problems. The study, which is not yet peer-reviewed, led some AI enthusiasts to perceive GPT-4 as less effective than its predecessor, leading to headlines questioning whether GPT-4 was "getting dumber. " However, this oversimplifies the reality of generative AI models, according to James Zou, one of the study's co-authors and an assistant professor of data science at Stanford University. Zou explains that it is challenging to determine whether GPT-4 or GPT-3. 5 is getting better or worse overall because the notion of improvement is subjective. OpenAI claims that, based on its internal metrics, GPT-4 performs better than previous versions across a range of tests. However, the company does not release benchmark data for every update, and they declined to comment on the recent preprint study.

OpenAI's reluctance to discuss the development and training of its large language models, as well as the opaqueness of AI algorithms, make it difficult to understand the causes behind changes in GPT-4's performance. Speculation and extrapolation are the only options for researchers outside the company. It is evident that GPT-4's behavior has changed since its initial release, as acknowledged by OpenAI in a blog post update. This behavioral shift, known as "model drift, " has been observed in the past with other models. This presents a challenge for developers and researchers who rely on these AI models for their work, as their expectations and usage could be disrupted when the model's behavior changes unexpectedly. Fine-tuning is a common process used to adjust AI models after initial training, and it can have unintended consequences. The capability and behavior of an AI are shaped by the model's parameters and the training data. Modifying parameters can unexpectedly alter the AI's behavior, and fine-tuning, akin to gene editing, introduces mutations that can result in ripple effects. Researchers like Zou are exploring ways to make the adjustment of big AI models more precise to avoid undesirable effects. In the case of GPT-4, changes made by OpenAI may have been aimed at reducing offensive or dangerous outputs but inadvertently impacted other aspects of the model's performance. For example, new limits on what the model can say may have unintentionally reduced its ability to provide detailed answers regarding prime numbers. Alternatively, the fine-tuning process might have introduced lower-quality training data that affected the level of detail in GPT-4's responses about certain mathematical topics. Regardless of the specific causes, it seems likely that GPT-4's actual ability to identify prime numbers did not change significantly between March and June. The model might have been relying more on trends in the data it was exposed to, leading to a shift in its default answer based on incidental patterns rather than actual reasoning. However, it's important to note that AI models do not develop habits like humans because they lack independent understanding and context. They rely solely on data to mimic reasoning rather than possessing true reasoning abilities.



Brief news summary

None
Business on autopilot

AI-powered Lead Generation in Social Media
and Search Engines

Let AI take control and automatically generate leads for you!

I'm your Content Manager, ready to handle your first test assignment

Language

Content Maker

Our unique Content Maker allows you to create an SEO article, social media posts, and a video based on the information presented in the article

news image

Last news

The Best for your Business

Learn how AI can help your business.
Let’s talk!

June 9, 2025, 10:28 a.m.

2025 Czech Government Bitcoin Scandal

The 2025 Czech government Bitcoin scandal is a major political controversy in the Czech Republic, centered on a large Bitcoin donation to the Ministry of Justice, which led to the resignation of Justice Minister Pavel Blažek.

June 9, 2025, 10:16 a.m.

Getty Images and Stability AI Face Landmark UK Co…

Getty Images and Stability AI are engaged in a major copyright trial in the British High Court that could significantly influence the future of the generative artificial intelligence (AI) industry.

June 9, 2025, 6:29 a.m.

Apple Heads into Annual Showcase Reeling from AI …

At the 2025 Worldwide Developers Conference, Apple faces significant challenges that threaten its traditional role as a leader in technological innovation.

June 9, 2025, 6:25 a.m.

Ripple and JETRO-Backed Web3 Salon Empower Blockc…

Ripple has announced a strategic partnership with Web3 Salon, a blockchain initiative supported by the Japan External Trade Organization (JETRO), aimed at strengthening Japan’s Web3 ecosystem.

June 8, 2025, 2:17 p.m.

Watch Out For These Levels If Bitcoin Price Retur…

The Bitcoin price has not exhibited the same momentum it showed at the start of last month throughout June.

June 8, 2025, 2:16 p.m.

Enterprises are getting stuck in AI pilot hell, s…

Interview Before AI becomes widespread in enterprises, corporate leaders must commit to a continuous security testing regime tailored to the specific nuances of AI models

June 8, 2025, 10:23 a.m.

Meta in Talks Over $10 Billion Investment in Scal…

Meta Platforms is reportedly in talks to invest over $10 billion in the artificial intelligence startup Scale AI, according to Bloomberg News.

All news