GPT-5 Scores 97% on ARC-AGI, Setting New SOTA Across All Major Benchmarks

April 11, 2026 1 min read Source: OpenAI Blog

OpenAI today released GPT-5, its most capable model to date, which has achieved a score of 97.2% on the ARC-AGI benchmark — a test specifically designed to measure abstract reasoning and generalization abilities that are difficult to solve by pattern memorization alone.

What is ARC-AGI?

The Abstraction and Reasoning Corpus (ARC-AGI), created by François Chollet, challenges AI systems with visual reasoning puzzles that require genuine problem-solving rather than statistical retrieval. Until recently, the best models scored below 90%.

GPT-5 Performance Highlights

Benchmark	Score	Previous SOTA
ARC-AGI	97.2%	85.4% (GPT-4o)
MMLU	94.8%	92.1%
HumanEval	98.1%	94.6%
MATH	96.3%	91.2%

The jump in ARC-AGI performance is particularly striking — a 12 percentage point improvement over the previous state of the art, suggesting qualitative changes in reasoning ability rather than incremental scaling.

Implications for the Field

Researchers at Anthropic, Google DeepMind, and Meta have already begun analyzing GPT-5’s outputs to understand how it approaches novel problems. Early findings suggest the model uses a form of internal chain-of-thought that resembles structured planning rather than next-token prediction alone.

GPT-5 is available today via the OpenAI API and ChatGPT. Pricing starts at $10 per million input tokens.