Meta Releases Llama 4 Scout and Maverick — Open Weights, SOTA Performance

April 9, 2026 1 min read Source: Meta AI Blog

Meta has released Llama 4 Scout and Llama 4 Maverick, two open-weight models that use a Mixture-of-Experts (MoE) architecture to deliver competitive performance at a fraction of the compute cost of dense models.

Architecture Overview

Both models use a 16-expert MoE setup, activating only 2 experts per token during inference. This means:

Llama 4 Scout: 109B total parameters, 17B active — runs on a single A100 80GB GPU
Llama 4 Maverick: 400B total parameters, 128B active — requires 4×H100 for efficient inference

The models were trained on 30 trillion tokens of multilingual data, including code, math, and scientific literature.

Benchmark Results

Task	Scout	Maverick	GPT-4o
MMLU	88.3%	93.7%	88.7%
HumanEval	79.1%	91.2%	90.2%
MATH	74.6%	88.9%	76.6%
Multilingual	81.2%	89.5%	85.1%

Maverick notably outperforms GPT-4o on coding and math tasks — a significant achievement for an open-weight model.

How to Access

Both models are available on Hugging Face under a permissive license that allows commercial use. Quantized versions (4-bit, 8-bit) are already available through Ollama, LM Studio, and llama.cpp.