lang icon En
Dec. 27, 2024, 10:02 p.m.
3641

OpenAI's O3 Model Achieves Milestone in AI Testing

Brief news summary

OpenAI's latest language model, "o3," has made significant strides in AI development by achieving a 76% score on the "Abstraction and Reasoning Corpus for Artificial General Intelligence" (ARC-AGI) test, surpassing the human average of just above 75%. This marks a historic achievement, as it's the first instance of an AI scoring at this level, representing a notable advancement in problem-solving and adaptability. François Chollet, creator of the ARC-AGI test and Google AI scientist, hailed this as a "genuine breakthrough," reflecting AI's growing ability to perform human-like tasks. Nevertheless, Chollet noted that this does not mean o3 has reached Artificial General Intelligence (AGI) because it still struggles with simpler tasks. He proposed that architectural innovations, possibly similar to the Monte Carlo tree search, might have contributed to o3's performance. Although o3 is a major step forward, it hasn't yet reached the level of universal human intelligence, and future ARC-AGI iterations may pose new challenges for AI models.

OpenAI's latest large language model, known as "o3, " has yet to be widely released, but preliminary tests hint at its abilities. The model was briefly introduced via a promotional video, with few details about its capabilities. Notably, o3 was tested using the "Abstraction and Reasoning Corpus for Artificial General Intelligence" (ARC-AGI), a benchmark designed to measure a model's ability to adapt to novel situations. In this test, o3 achieved a 76% accuracy, surpassing human scores recorded by Mechanical Turk workers, marking what some consider a breakthrough in AI's ability to tackle new tasks. François Chollet, who developed ARC-AGI, highlights that o3's score demonstrates a significant leap in AI capability, suggesting o3 could soon compete with human work. Although Chollet has previously been skeptical about AI reaching human-level intelligence, he acknowledges this development as a shift.

The ARC-AGI test involves solving visual puzzles that do not rely on language, challenging models like o3 in new ways. Despite its successes, o3's methods remain opaque due to its closed-source nature. Chollet speculates that the model's architecture differs significantly from its predecessors, employing a test-time search approach akin to that used by Google's AlphaZero. However, Chollet notes that OpenAI hasn't disclosed the resource expenditure for achieving ARC-AGI scores, which could affect perceived efficiency. Questions about o3's general adaptability remain, as it was specifically trained for the ARC-AGI test. Chollet emphasizes that while o3 shows promise, it still fails on some simple tasks, suggesting it's not yet at AGI (artificial general intelligence) level. He plans to release an updated version of ARC-AGI to further challenge models like o3, indicating that true AGI is still out of reach for now.


Watch video about

OpenAI's O3 Model Achieves Milestone in AI Testing

Try our premium solution and start getting clients — at no cost to you

I'm your Content Creator.
Let’s make a post or video and publish it on any social media — ready?

Language

Hot news

Dec. 20, 2025, 1:24 p.m.

5 Cultural Attributes That Could Make or Break Yo…

Summary and Rewrite of “The Gist” on AI Transformation and Organizational Culture AI transformation poses primarily a cultural challenge rather than a purely technological one

Dec. 20, 2025, 1:22 p.m.

AI Sales Agent: Top 5 Future Sales Boosters of 20…

The ultimate aim of businesses is to expand sales, but stiff competition can impede this goal.

Dec. 20, 2025, 1:19 p.m.

AI and SEO: A Perfect Match for Enhanced Online V…

The incorporation of artificial intelligence (AI) into search engine optimization (SEO) strategies is fundamentally transforming how businesses improve their online visibility and attract organic traffic.

Dec. 20, 2025, 1:15 p.m.

Deepfake Technology Advances: Implications for Me…

Deepfake technology has made significant strides recently, producing highly realistic manipulated videos that convincingly portray individuals doing or saying things they never actually did.

Dec. 20, 2025, 1:13 p.m.

Nvidia's Open Source AI Push: Acquisition and New…

Nvidia has announced a significant expansion of its open source initiatives, signaling a strategic commitment to supporting and advancing the open source ecosystem in high-performance computing (HPC) and artificial intelligence (AI).

Dec. 20, 2025, 9:38 a.m.

N.Y. Gov. Kathy Hochul signs sweeping AI safety b…

On December 19, 2025, New York Governor Kathy Hochul signed the Responsible Artificial Intelligence Safety and Ethics (RAISE) Act into law, marking a significant milestone in the state’s regulation of advanced AI technologies.

Dec. 20, 2025, 9:36 a.m.

Stripe launches Agentic Commerce Suite for AI sal…

Stripe, the programmable financial services firm, has introduced the Agentic Commerce Suite, a new solution aimed at enabling businesses to sell through multiple AI agents.

All news

AI Company

Launch your AI-powered team to automate Marketing, Sales & Growth

and get clients on autopilot — from social media and search engines. No ads needed

Begin getting your first leads today