If you're searching for a new reason to feel uneasy about artificial intelligence, consider this: some of the brightest minds in the world are having difficulty creating tests that A. I. systems cannot pass. For years, A. I. systems have been evaluated using a variety of standardized benchmark tests. Many of these tests featured challenging, SAT-level questions in subjects like math, science, and logic. Tracking the scores of these models over time provided a rough indication of advancements in A. I. However, A. I. systems eventually excelled at these assessments, prompting the development of more difficult tests—often featuring questions that graduate students might face on their examinations. Unfortunately, those tests aren’t performing well either. New models from companies such as OpenAI, Google, and Anthropic have been achieving high scores on many Ph. D. -level challenges, diminishing the tests' effectiveness and raising a concerning question: are A. I.
systems becoming too intelligent for us to evaluate? This week, researchers at the Center for AI Safety and Scale AI are set to offer a potential answer: a new evaluation called “Humanity’s Last Exam, ” which they assert is the most challenging test ever given to A. I. systems. Humanity’s Last Exam was conceived by Dan Hendrycks, a notable A. I. safety researcher and the director of the Center for AI Safety. (The test’s initial title, “Humanity’s Last Stand, ” was abandoned due to being overly theatrical. ) Thank you for your understanding as we confirm access. If you are in Reader mode, please exit and log into your Times account, or subscribe for complete access to The Times. Thank you for your understanding during the verification process. Already a subscriber?Log in. Want full access to The Times?Subscribe.
Humanity's Last Exam: A New Challenge for Advanced AI Systems
In the competitive enterprise sales environment, where quotas are missed and growth slows, Gong is establishing artificial intelligence as a crucial driver transforming revenue operations.
Earlier this month, Microsoft released a playbook designed to help retailers boost their visibility within AI search, browsers, and assistants.
Arlist has launched an end-to-end, production-ready AI video ecosystem featuring a comprehensive creative infrastructure tailored for commercial projects.
San Francisco–based startup GIGR, operating as Playad, announced it has secured $5.4 million in pre-seed funding to speed up the development of AI-powered marketing agents aimed at helping companies create, test, and optimize advertising creatives with reduced manual effort.
Nvidia, a leading force in the artificial intelligence revolution, announced on Monday a major $2 billion investment in CoreWeave, a prominent data center company.
Microsoft has introduced a groundbreaking initiative called the AI Accelerator for Sales, aimed at transforming the sales industry through the integration of artificial intelligence technologies.
The 2024 State of Marketing AI Report highlights a significant transformation in marketing driven by the growing role of artificial intelligence (AI) in enhancing customer experiences through personalization.
Launch your AI-powered team to automate Marketing, Sales & Growth
and get clients on autopilot — from social media and search engines. No ads needed
Begin getting your first leads today