Meta has officially unveiled Llama 4, its groundbreaking multimodal large language model (LLM), setting a new standard for AI technology. Llama 4 seamlessly integrates and processes diverse data types, including text, video, images, and audio, enabling flexible conversions across these formats. In recent benchmarks, it outperformed top models such as GPT-4o, Gemini 2.0, and DeepSeek v3.
Innovative Model Variants
Llama 4 comes in two powerful variants:
Llama 4 Scout
- Context Window: 10M tokens, enabling the processing of massive datasets equivalent to entire encyclopedias.
- Parameters: 109B total parameters, 16 experts
- Ideal for: Financial/legal document summarization, personalized automation based on extensive user history, and advanced multimodal image analytics.
Llama 4 Maverick
- Context Window: 1M tokens, suitable for extensive datasets such as complete code repositories and comprehensive research archives.
- Parameters: 400B total parameters, 128 experts
- Optimized for: High-speed, high-quality interactions in creative writing, chatbots, multilingual customer support, and precise image interpretation.
- Supports 12 languages, enhancing global usability.
A third variant, Llama 4 Behemoth, remains in internal development as it did not yet meet target performance standards. With 288B active parameters and nearly 2 trillion total parameters, it currently serves as a teacher model for refining Maverick.
Cutting-edge Technology Highlights
Mixture of Experts (MoE): Llama 4 utilizes MoE architecture to activate only specific expert parameters when required, significantly enhancing computational efficiency and performance.
Early Multimodal Fusion: The model integrates visual and textual data right from the initial learning stages, supporting up to 48 input images, and performs exceptionally well in multimodal contexts.
Advanced Training Methods:
- iRoPE (interleaved Rotary Position Embedding) boosts text and code generalization.
- MetaP & FP8 techniques facilitate rapid and efficient training.
- A refined training pipeline involving supervised fine-tuning (SFT), online reinforcement learning (RL), and Direct Preference Optimization (DPO).
A New Benchmark in AI Development
With the launch of Llama 4, Meta has redefined possibilities within multimodal AI. Its expert-driven architecture and multimodal integration enable developers and enterprises to build more sophisticated and personalized applications, marking a significant leap forward in the AI industry.