Meta's Llama herd

1 min read

Cover Image for Meta's Llama herd

Meta has released they new models. There are termed as Llama 4 herd. Llama 4 Models:

The Scout model seems very interesting due to its relatively small size and less activated parameters.

Llama 4 Scout: - 17B active parameters, 16 experts, 109B total. - Fits on a single H100 GPU (INT4-quantized). - 10M token context window - Outperforms previous Llama releases on multimodal tasks while being more resource-friendly. - Employs iRoPE architecture for efficient long-context attention. - Tested with up to 8 images per prompt. Llama 4 Maverick: - 17B active parameters, 128 experts, 400B total. - 1M token context window. - Not single-GPU; runs on one H100 DGX host or can be distributed for greater efficiency. - Outperforms GPT-4o and Gemini 2.0 Flash on coding, reasoning, and multilingual tests at a competitive cost. - Maintains strong image understanding and grounded reasoning ability. Llama 4 Behemoth (Preview): - 288B active parameters, 16 experts, nearly 2T total. - Still in training; not yet released. - Exceeds GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on STEM benchmarks (e.g., MATH-500, GPQA Diamond). - Serves as the “teacher” model for Scout and Maverick via co-distillation.