๐Ÿ“Š
Leaderboards

April 2026 LLM Leaderboard โ€” Overall Performance

April 11, 2026 1 min read
Benchmark
MMLU + HumanEval + MATH + ARC-AGI composite
Updated: 2026-04-11
#ModelScore
1GPT-596.1
2Claude Opus 4.694.3
3Gemini Ultra 292.7
4Claude Sonnet 4.691.2
5Llama 4 Maverick90.8
6GPT-4o88.9
7Gemini 2.5 Pro87.4
8Llama 4 Scout85.6
9Mistral Large 383.2
10Qwen 3 72B81.9

This leaderboard combines four major benchmarks into a single composite score, weighted equally. Updated monthly as new models are released.

Methodology

Scores are averaged across four benchmarks:

BenchmarkWeightMeasures
MMLU25%General knowledge (57 subjects)
HumanEval25%Code generation
MATH25%Mathematical reasoning
ARC-AGI25%Abstract reasoning / generalization

All models are evaluated in zero-shot mode unless the benchmark specifies otherwise. We use the official benchmark implementations where available.

Notable Movements This Month

  • GPT-5 debuts at #1 with a record 97.2% on ARC-AGI
  • Llama 4 Maverick enters at #5 โ€” the highest any open-weight model has ever placed on our composite
  • GPT-4o drops two positions as newer models surpass it
  • Qwen 3 72B is new this month โ€” China’s strongest open-source entry

Open-Source Highlight

Llama 4 Maverick at #5 is the most significant open-source milestone yet. It outscores GPT-4o (#6) and closes within 5 points of Claude Opus 4.6 (#2) โ€” all with freely available weights.

Pricing per Million Tokens (Input)

Model$/1M input
GPT-5$10
Claude Opus 4.6$15
Gemini Ultra 2$7
Claude Sonnet 4.6$3
Llama 4 MaverickFree (self-hosted)