News
Meta Releases Llama 4 Scout and Maverick — Open Weights, SOTA Performance
Meta has released Llama 4 Scout and Llama 4 Maverick, two open-weight models that use a Mixture-of-Experts (MoE) architecture to deliver competitive performance at a fraction of the compute cost of dense models.
Architecture Overview
Both models use a 16-expert MoE setup, activating only 2 experts per token during inference. This means:
- Llama 4 Scout: 109B total parameters, 17B active — runs on a single A100 80GB GPU
- Llama 4 Maverick: 400B total parameters, 128B active — requires 4×H100 for efficient inference
The models were trained on 30 trillion tokens of multilingual data, including code, math, and scientific literature.
Benchmark Results
| Task | Scout | Maverick | GPT-4o |
|---|---|---|---|
| MMLU | 88.3% | 93.7% | 88.7% |
| HumanEval | 79.1% | 91.2% | 90.2% |
| MATH | 74.6% | 88.9% | 76.6% |
| Multilingual | 81.2% | 89.5% | 85.1% |
Maverick notably outperforms GPT-4o on coding and math tasks — a significant achievement for an open-weight model.
How to Access
Both models are available on Hugging Face under a permissive license that allows commercial use. Quantized versions (4-bit, 8-bit) are already available through Ollama, LM Studio, and llama.cpp.