The Llama 4 herd: The beginning of a new era of natively multimodal AI innovation

Takeaways

We’re sharing the first models in the Llama 4 herd, which will enable people to build more personalized multimodal experiences.
Llama 4 Scout, a 17 billion active parameter model with 16 experts, is the best multimodal model in the world in its class and is more powerful than all previous generation Llama models, while fitting in a single NVIDIA H100 GPU. Additionally, Llama 4 Scout offers an industry-leading context window of 10M and delivers better results than Gemma 3, Gemini 2.0 Flash-Lite, and Mistral 3.1 across a broad range of widely reported benchmarks.
Llama 4 Maverick, a 17 billion active parameter model with 128 experts, is the best multimodal model in its class, beating GPT-4o and Gemini 2.0 Flash across a broad range of widely reported benchmarks, while achieving comparable results to the new DeepSeek v3 on reasoning and coding—at less than half the active parameters. Llama 4 Maverick offers a best-in-class performance to cost ratio with an experimental chat version scoring ELO of 1417 on LMArena.
These models are our best yet thanks to distillation from Llama 4 Behemoth, a 288 billion active parameter model with 16 experts that is our most powerful yet and among the world’s smartest LLMs. Llama 4 Behemoth outperforms GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on several STEM benchmarks. Llama 4 Behemoth is still training, and we’re excited to share more details about it even while it’s still in flight.
Download the Llama 4 Scout and Llama 4 Maverick models today on llama.com and Hugging Face. Try Meta AI built with Llama 4 in WhatsApp, Messenger, Instagram Direct, and on the web.

As more people continue to use artificial intelligence to enhance their daily lives, it’s important that the leading models and systems are openly available so everyone can build the future of personalized experiences. Today, we’re excited to announce the most advanced suite of models that support the entire Llama ecosystem. We’re introducing Llama 4 Scout and Llama 4 Maverick, the first open-weight natively multimodal models with unprecedented context length support and our first built using a mixture-of-experts (MoE) architecture. We’re also previewing Llama 4 Behemoth, one of the smartest LLMs in the world and our most powerful yet to serve as a teacher for our new models.

These Llama 4 models mark the beginning of a new era for the Llama ecosystem. We designed two efficient models in the Llama 4 series, Llama 4 Scout, a 17 billion active parameter model with 16 experts, and Llama 4 Maverick, a 17 billion active parameter model with 128 experts. The former fits on a single H100 GPU (with Int4 quantization) while the latter fits on a single H100 host. We also trained a teacher model, Llama 4 Behemoth, that outperforms GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on STEM-focused benchmarks such as MATH-500 and GPQA Diamond. While we’re not yet releasing Llama 4 Behemoth as it is still training, we’re excited to share more technical details about our approach.

We continue to believe that openness drives innovation and is good for developers, good for Meta, and good for the world. We’re making Llama 4 Scout and Llama 4 Maverick available for download today on llama.com and Hugging Face so everyone can continue to build new experiences using our latest technology. We’ll also make them available via our partners in the coming days. You can also try Meta AI with Llama 4 starting today in WhatsApp, Messenger, Instagram Direct, and on the Meta.AI website.

This is just the beginning for the Llama 4 collection. We believe that the most intelligent systems need to be capable of taking generalized actions, conversing naturally with humans, and working through challenging problems they haven’t seen before. Giving Llama superpowers in these areas will lead to better products for people on our platforms and more opportunities for developers to innovate on the next big consumer and business use cases. We’re continuing to research and prototype both models and products, and we’ll share more about our vision at LlamaCon on April 29—sign up to hear more.

Whether you’re a developer building on top of our models, an enterprise integrating them into your workflows, or simply curious about the potential uses and benefits of AI, Llama 4 Scout and Llama 4 Maverick are the best choices for adding next-generation intelligence to your products. Today, we’re excited to share more about the four major parts of their development and insights into our research and design process. We also can’t wait to see the incredible new experiences the community builds with our new Llama 4 models.

Pre-training

These models represent the best of Llama, offering multimodal intelligence at a compelling price while outperforming models of significantly larger sizes. Building the next generation of Llama models required us to take several new approaches during pre-training.

Our new Llama 4 models are our first models that use a mixture of experts (MoE) architecture. In MoE models, a single token activates only a fraction of the total parameters. MoE architectures are more compute efficient for training and inference and, given a fixed training FLOPs budget, delivers higher quality compared to a dense model.

Resume Analysis

Resume Analysis

The Llama 4 herd: The beginning of a new era of natively multimodal AI innovation

Takeaways

Pre-training

Menu

Services

Join Us On

Login to Construct AI

Reset Password

Create a free Construct AI account

Resume Analysis

Login to Construct AI

Reset Password

Create a free Construct AI account

Resume Analysis

The Llama 4 herd: The beginning of a new era of natively multimodal AI innovation

Takeaways

Pre-training

Menu

Services

Join Us On

Reset Password