April 2026 LLM Leaderboard — Overall Performance

April 11, 2026 1 min read

Benchmark

MMLU + HumanEval + MATH + ARC-AGI composite

Updated: 2026-04-11

#	Model	Score
1	GPT-5	96.1
2	Claude Opus 4.6	94.3
3	Gemini Ultra 2	92.7
4	Claude Sonnet 4.6	91.2
5	Llama 4 Maverick	90.8
6	GPT-4o	88.9
7	Gemini 2.5 Pro	87.4
8	Llama 4 Scout	85.6
9	Mistral Large 3	83.2
10	Qwen 3 72B	81.9

This leaderboard combines four major benchmarks into a single composite score, weighted equally. Updated monthly as new models are released.

Methodology

Scores are averaged across four benchmarks:

Benchmark	Weight	Measures
MMLU	25%	General knowledge (57 subjects)
HumanEval	25%	Code generation
MATH	25%	Mathematical reasoning
ARC-AGI	25%	Abstract reasoning / generalization

All models are evaluated in zero-shot mode unless the benchmark specifies otherwise. We use the official benchmark implementations where available.

Notable Movements This Month

GPT-5 debuts at #1 with a record 97.2% on ARC-AGI
Llama 4 Maverick enters at #5 — the highest any open-weight model has ever placed on our composite
GPT-4o drops two positions as newer models surpass it
Qwen 3 72B is new this month — China’s strongest open-source entry

Open-Source Highlight

Llama 4 Maverick at #5 is the most significant open-source milestone yet. It outscores GPT-4o (#6) and closes within 5 points of Claude Opus 4.6 (#2) — all with freely available weights.

Pricing per Million Tokens (Input)

Model	$/1M input
GPT-5	$10
Claude Opus 4.6	$15
Gemini Ultra 2	$7
Claude Sonnet 4.6	$3
Llama 4 Maverick	Free (self-hosted)