🤖
News

Meta Releases Llama 4 Scout and Maverick — Open Weights, SOTA Performance

April 9, 2026 1 min read Source: Meta AI Blog

Meta has released Llama 4 Scout and Llama 4 Maverick, two open-weight models that use a Mixture-of-Experts (MoE) architecture to deliver competitive performance at a fraction of the compute cost of dense models.

Architecture Overview

Both models use a 16-expert MoE setup, activating only 2 experts per token during inference. This means:

  • Llama 4 Scout: 109B total parameters, 17B active — runs on a single A100 80GB GPU
  • Llama 4 Maverick: 400B total parameters, 128B active — requires 4×H100 for efficient inference

The models were trained on 30 trillion tokens of multilingual data, including code, math, and scientific literature.

Benchmark Results

TaskScoutMaverickGPT-4o
MMLU88.3%93.7%88.7%
HumanEval79.1%91.2%90.2%
MATH74.6%88.9%76.6%
Multilingual81.2%89.5%85.1%

Maverick notably outperforms GPT-4o on coding and math tasks — a significant achievement for an open-weight model.

How to Access

Both models are available on Hugging Face under a permissive license that allows commercial use. Quantized versions (4-bit, 8-bit) are already available through Ollama, LM Studio, and llama.cpp.