Llama 4 Scout
Advertised context: 10M tokens
Llama 4 Scout's 10 million token context window is the largest publicly released in any open-weight model. While effective context (the range over which the model retains >90% retrieval accuracy) is shorter than the advertised 10M, Scout's headroom is unmatched — for use cases that need to reason over genuinely massive documents or codebases as a single unit, Scout has no peer. The 17B active parameter MoE architecture keeps inference economics tractable despite the scale.
Strengths
- 10M token context — largest in any publicly released open-weight model
- Native multimodal capability across the long context
- 17B active parameter inference economics
- Mature deployment ecosystem (llama.cpp, vLLM, TensorRT-LLM)
Trade-offs
- Llama Community License usage caps and attribution requirements
- Effective context substantially shorter than 10M advertised limit
- Multi-GPU deployment required for long-context inference at full quality