lang icon En
Jan. 26, 2025, 8:30 a.m.
2570

Humanity's Last Exam: A New Challenge for Advanced AI Systems

Brief news summary

As artificial intelligence (A.I.) capabilities raise concerns, experts face the challenge of creating tests that A.I. cannot easily pass. Historically, advancements in A.I. were measured through standardized assessments akin to the S.A.T., evaluating skills in math, science, and logic. However, as A.I. models have improved, they have excelled in these tests, leading to the development of more difficult assessments reminiscent of graduate-level exams. Recent models from companies like OpenAI, Google, and Anthropic have performed exceptionally well on these advanced tests, highlighting the inadequacy of current evaluation methods. In light of this issue, researchers from the Center for AI Safety and Scale AI are launching "Humanity's Last Exam," described as the most challenging test for A.I. systems yet. This initiative, spearheaded by A.I. safety expert Dan Hendrycks, seeks to tackle the urgent question: Are A.I. systems now too intelligent for us to accurately assess?

If you're searching for a new reason to feel uneasy about artificial intelligence, consider this: some of the brightest minds in the world are having difficulty creating tests that A. I. systems cannot pass. For years, A. I. systems have been evaluated using a variety of standardized benchmark tests. Many of these tests featured challenging, SAT-level questions in subjects like math, science, and logic. Tracking the scores of these models over time provided a rough indication of advancements in A. I. However, A. I. systems eventually excelled at these assessments, prompting the development of more difficult tests—often featuring questions that graduate students might face on their examinations. Unfortunately, those tests aren’t performing well either. New models from companies such as OpenAI, Google, and Anthropic have been achieving high scores on many Ph. D. -level challenges, diminishing the tests' effectiveness and raising a concerning question: are A. I.

systems becoming too intelligent for us to evaluate? This week, researchers at the Center for AI Safety and Scale AI are set to offer a potential answer: a new evaluation called “Humanity’s Last Exam, ” which they assert is the most challenging test ever given to A. I. systems. Humanity’s Last Exam was conceived by Dan Hendrycks, a notable A. I. safety researcher and the director of the Center for AI Safety. (The test’s initial title, “Humanity’s Last Stand, ” was abandoned due to being overly theatrical. ) Thank you for your understanding as we confirm access. If you are in Reader mode, please exit and log into your Times account, or subscribe for complete access to The Times. Thank you for your understanding during the verification process. Already a subscriber?Log in. Want full access to The Times?Subscribe.


Watch video about

Humanity's Last Exam: A New Challenge for Advanced AI Systems

Try our premium solution and start getting clients — at no cost to you

I'm your Content Creator.
Let’s make a post or video and publish it on any social media — ready?

Language

Hot news

Jan. 26, 2026, 9:31 a.m.

Gong’s AI Sales Revolution: Scaling Revenue Throu…

In the competitive enterprise sales environment, where quotas are missed and growth slows, Gong is establishing artificial intelligence as a crucial driver transforming revenue operations.

Jan. 26, 2026, 9:23 a.m.

New Microsoft Retail AI Guide Echoes SEO

Earlier this month, Microsoft released a playbook designed to help retailers boost their visibility within AI search, browsers, and assistants.

Jan. 26, 2026, 9:23 a.m.

Artlist AI Video Ecosystem

Arlist has launched an end-to-end, production-ready AI video ecosystem featuring a comprehensive creative infrastructure tailored for commercial projects.

Jan. 26, 2026, 9:22 a.m.

Startup Playad Raises $5.4m To Build AI Marketing…

San Francisco–based startup GIGR, operating as Playad, announced it has secured $5.4 million in pre-seed funding to speed up the development of AI-powered marketing agents aimed at helping companies create, test, and optimize advertising creatives with reduced manual effort.

Jan. 26, 2026, 9:20 a.m.

Nvidia Invests $2 Billion in CoreWeave to Strengt…

Nvidia, a leading force in the artificial intelligence revolution, announced on Monday a major $2 billion investment in CoreWeave, a prominent data center company.

Jan. 26, 2026, 5:26 a.m.

Microsoft Launches AI Accelerator for Sales, AI A…

Microsoft has introduced a groundbreaking initiative called the AI Accelerator for Sales, aimed at transforming the sales industry through the integration of artificial intelligence technologies.

Jan. 26, 2026, 5:25 a.m.

AI-Powered Personalization: Enhancing Customer En…

The 2024 State of Marketing AI Report highlights a significant transformation in marketing driven by the growing role of artificial intelligence (AI) in enhancing customer experiences through personalization.

All news

AI Company

Launch your AI-powered team to automate Marketing, Sales & Growth

and get clients on autopilot — from social media and search engines. No ads needed

Begin getting your first leads today