None

In March, I conducted a study to determine the best generative AI platform. Now, ten months later, I have decided to redo the study with additional test queries and a revised evaluation approach. In this updated analysis, I will provide my findings on the best generative AI platform based on a breakdown of evaluation across various activity categories. The platforms tested in this study include all except for SGE, as it did not consistently provide relevant responses to the intended queries by Google. It's worth noting that I used the graphical user interface for all tools and did not include GPT-4 Turbo, a variant that offers improvements to GPT-4 with data up to April 2023, which is only accessible through the GPT-4 API. Each generative AI platform was given the same set of 44 different questions covering various topics. These questions were presented as simple inquiries rather than highly tailored prompts, providing a more user-oriented evaluation of the tools' performance. Among the platforms tested, Bard/Gemini achieved the highest overall scores across all 44 queries, although this doesn't necessarily mean it was the clear winner - more on this later. Bard excelled particularly in local search queries, earning a perfect score of 4 for two of those queries. On the other hand, the two Bing Chat solutions performed below expectations in local queries, incorrectly identifying my location as Concord, Mass. , instead of Falmouth, Mass. Bing also exhibited a slightly higher number of outright accuracy issues compared to Bard. However, Bing stood out for its ability to provide citations and additional resources for further reading, surpassing other platforms in this regard. ChatGPT and Claude generally did not attempt to provide resources or citations, and Bard only did so infrequently. This limitation in Bard's functionality was a significant disappointment. ChatGPT's scores were affected by its performance on queries that required the installation of the MixerBox WebSearchG plugin, which greatly improved its competitiveness in current events and browsing current webpages. The core test results were conducted without this plugin, but supplementary testing with the plugin revealed enhanced performance by ChatGPT. More details on the impact of this plugin on ChatGPT will be discussed later. Despite lagging behind in some categories, Claude should not be underestimated as it performed well in many queries and excelled at generating article outlines. Unfortunately, our test did not highlight some of Claude's strengths, such as file uploads, larger prompt acceptance, and more in-depth responses (up to 100, 000 tokens, 12 times more than ChatGPT). Depending on the task at hand, Claude could be the best platform for certain work. A comprehensive evaluation that recognizes the strengths of each tool for different query types is crucial in determining the most suitable platform for specific needs. Both Bing Chat Balanced and Bing Chat Creative proved to be competitive on multiple fronts. Similarly, ChatGPT demonstrated its competence across queries that did not require current context or access to live webpages, and it achieved the highest scores in several categories. Various query types were tested, including local, content gap identification, current events, jokes, article outlines, and content creation. The individual scores for these categories were also combined into a Total metric, with Resources score excluded to compare platforms without bias towards search engine-provided solutions. Additionally, it's important to acknowledge the significance of providing access to follow-on resources and citations for an optimal user experience. It is unrealistic to expect one specific response to fully cover all aspects of a user's query, unless it is a straightforward question like the conversion of teaspoons to tablespoons. Bing's implementation of linking out to resources positions it as a leading solution in this aspect.
The chart below shows the percentage of times each platform demonstrated strong scores for being On Topic, Accurate, Complete, and High Quality: Based on initial data, it appears that Bard has an edge over the competition, but this is largely attributed to Bard's outstanding performance in specific query categories. To gain a better understanding, let's explore the scores broken down by category: In each category, the winner is highlighted in light green. Three local queries were included in the test: For the closest pizza shop query, when I was in Falmouth, both Bing Chat Balanced and Bing Chat Creative identified pizza shops in Concord, which is 90 miles away. Bard, however, provided a much better response to this query. Similarly, when I asked how to use a router to cut a circular table top, neither of the Bing solutions comprehended the context correctly. Bard outperformed the others in this query as well. Six queries were aimed at identifying content gaps in existing published content. Bard performed exceptionally well in this category, closely trailed by Bing Chat Creative and Bing Chat Balanced. ChatGPT and Claude struggled due to their inability to access current webpages. Queries related to current events were also included in the test. ChatGPT and Claude performed poorly in this category due to their outdated datasets. Bard achieved an average score of 6. 0, while Bing Chat Balanced scored 6. 3. Though there were gaps in the responses from all platforms, Bard garnered the highest total score of 6. 0. The two Bing solutions followed closely with a score of 8. 0. Three queries were designed to request jokes, with perfect scores awarded for declining to tell a joke. All platforms performed flawlessly in this category. Article outlines were generated for three queries, revealing differences in comprehensiveness among platforms. Bing Chat Balanced failed to mention major events in the outline for Russian history, while the other platforms scored similarly. Bard, ChatGPT, Claude, and Bing Chat Creative can all provide an initial draft of an article outline, but subject matter expertise and thorough review are necessary before finalizing the content. Content creation was tested with five different queries, including a World War II history question. Bard, ChatGPT, Claude, and Bing Chat Creative all had their shortcomings in this category, but Claude displayed the best response overall. The medical queries tested required cautious responses, providing basic introductory information while stressing the importance of consulting a doctor. Bing Chat Balanced demonstrated good advice for consultation, although the response lacked a comprehensive overview of available blood test types. Disambiguation queries were generally challenging for the platforms, but Bard excelled at answering the question "Who is Danny Sullivan?" by accurately separating and discussing two individuals with the same name. Adding the MixerBox WebSearchG plugin to ChatGPT greatly improved its performance on content gap identification queries. With this plugin, ChatGPT's scores were significantly enhanced for these questions. It's important to note that this study was limited to 44 questions, making the results based on a small sample size. The time-consuming task of researching accuracy and completeness for each response led to the smaller query set. Nevertheless, here are my conclusions: The field of generative AI platforms is still in its early stages, and developments will continue to emerge rapidly. Google and Bing have inherent advantages in the long run, as they can leverage their search engine histories to enhance their ability to meet query intent accurately and minimize hallucinations. However, the successful implementation of these capabilities and the improvement of existing functionalities will determine how well they compete. Exciting advancements lie ahead, and it will be fascinating to watch how this technology evolves. Eric Enge, President of Pilot Holding and founder of Stone Temple, an acclaimed digital marketing agency, conducted this study. Enge is a renowned author, researcher, teacher, keynote speaker, and panelist at major industry conferences, often referred to as an authority in the field of SEO.
Brief news summary
None
AI-powered Lead Generation in Social Media
and Search Engines
Let AI take control and automatically generate leads for you!

I'm your Content Manager, ready to handle your first test assignment
Learn how AI can help your business.
Let’s talk!

ONFA Fintech USA Partners with Metti Capital Fund…
SAN FRANCISCO, May 18, 2025 (GLOBE NEWSWIRE) — ONFA FINTECH USA, a subsidiary of METTITECH GROUP HOLDINGS, has entered a strategic agreement supported by Metti Capital Funding to advance its blockchain-based digital banking platform.

Microsoft Aims to Enhance AI Collaboration and Me…
Microsoft is advancing a future where AI agents from different companies collaborate seamlessly and retain task-specific memories.

DUSK Network to Participate in Dutch Blockchain W…
DUSK Network is set to take part in Dutch Blockchain Week on May 21st in Amsterdam.

How Students Are Fending Off Accusations That The…
A few weeks into her sophomore year of college, Leigh Burrell received a notification that made her stomach drop.

Hong Kong Stocks Outpace Mainland China by Widest…
Hong Kong's stock market has shown exceptional strength in 2024, significantly outperforming mainland Chinese markets.

Nvidia CEO: If I were a student today, here's how…
If Nvidia CEO Jensen Huang were a student again, he would leverage generative AI to build a successful career.

Poof is Solana’s new magic trick for no-code prom…
Imagine writing a sentence and instantly receiving a live blockchain app—no coding, no setup hassles, no wallet complications.