In March, OpenAI released GPT-4, a large language model that was remarkably skilled at identifying prime numbers. It accurately labeled 97. 6 percent of a series of 500 prime numbers provided to it. However, the model's performance took a drastic turn in June. During the same test, GPT-4 only correctly labeled 2. 4 percent of the prime numbers. This change emphasizes the complexity of large AI models, which do not consistently improve in every task. Instead, their progress resembles a winding road with obstacles and diversions. The dramatic shift in GPT-4's performance was detailed in a preprint study conducted by three computer scientists: two from Stanford University and one from the University of California, Berkeley. The researchers compared GPT-4 to its predecessor, GPT-3. 5, in tests conducted in March and June, revealing numerous differences between the two models and variations in their output over time. Notably, in June, GPT-4's responses were less verbose compared to March, and the model appeared less inclined to provide explanations. It also acquired new quirks, such as appending accurate but potentially disruptive descriptions to sections of computer code. On the positive side, GPT-4 became more cautious, filtering out more offensive responses and displaying a reduced inclination to offer illegal or discriminatory suggestions. It also showed a slight improvement in solving visual reasoning problems. The study, which is not yet peer-reviewed, led some AI enthusiasts to perceive GPT-4 as less effective than its predecessor, leading to headlines questioning whether GPT-4 was "getting dumber. " However, this oversimplifies the reality of generative AI models, according to James Zou, one of the study's co-authors and an assistant professor of data science at Stanford University. Zou explains that it is challenging to determine whether GPT-4 or GPT-3. 5 is getting better or worse overall because the notion of improvement is subjective. OpenAI claims that, based on its internal metrics, GPT-4 performs better than previous versions across a range of tests. However, the company does not release benchmark data for every update, and they declined to comment on the recent preprint study.
OpenAI's reluctance to discuss the development and training of its large language models, as well as the opaqueness of AI algorithms, make it difficult to understand the causes behind changes in GPT-4's performance. Speculation and extrapolation are the only options for researchers outside the company. It is evident that GPT-4's behavior has changed since its initial release, as acknowledged by OpenAI in a blog post update. This behavioral shift, known as "model drift, " has been observed in the past with other models. This presents a challenge for developers and researchers who rely on these AI models for their work, as their expectations and usage could be disrupted when the model's behavior changes unexpectedly. Fine-tuning is a common process used to adjust AI models after initial training, and it can have unintended consequences. The capability and behavior of an AI are shaped by the model's parameters and the training data. Modifying parameters can unexpectedly alter the AI's behavior, and fine-tuning, akin to gene editing, introduces mutations that can result in ripple effects. Researchers like Zou are exploring ways to make the adjustment of big AI models more precise to avoid undesirable effects. In the case of GPT-4, changes made by OpenAI may have been aimed at reducing offensive or dangerous outputs but inadvertently impacted other aspects of the model's performance. For example, new limits on what the model can say may have unintentionally reduced its ability to provide detailed answers regarding prime numbers. Alternatively, the fine-tuning process might have introduced lower-quality training data that affected the level of detail in GPT-4's responses about certain mathematical topics. Regardless of the specific causes, it seems likely that GPT-4's actual ability to identify prime numbers did not change significantly between March and June. The model might have been relying more on trends in the data it was exposed to, leading to a shift in its default answer based on incidental patterns rather than actual reasoning. However, it's important to note that AI models do not develop habits like humans because they lack independent understanding and context. They rely solely on data to mimic reasoning rather than possessing true reasoning abilities.
None
Ingram Micro Holding (INGM) recently launched its new AI-powered Sales Briefing Assistant, utilizing Google’s Gemini large language models.
Dappier, a company specializing in consumer-focused AI interfaces, has announced a strategic partnership with LiveRamp, a data connectivity platform known for identity resolution and data onboarding expertise.
Omneky has launched an innovative product called Smart Ads, aimed at transforming how marketers develop advertising campaigns.
Google has launched a new online video editing application called Google Vids, which utilizes the company's advanced Gemini technology.
SEO Company has introduced a revolutionary advancement in search engine optimization with its Autonomous SEO Agent, an AI-driven system designed to continuously analyze, audit, and optimize websites autonomously, without human intervention.
Empowering marketers and franchisees with a superhuman edge for on-brand local marketing anytime, anywhere.
Artificial intelligence (AI) is swiftly reshaping the field of search engine optimization (SEO) by greatly enhancing content personalization and boosting user engagement.
Launch your AI-powered team to automate Marketing, Sales & Growth
and get clients on autopilot — from social media and search engines. No ads needed
Begin getting your first leads today