Ministral 3 8B Base 2512 is a mid-sized, dense model in the Ministral 3 series, designed as a general-purpose foundation for text and image tasks. It pairs an 8.4B-parameter language model with a 0.4B-parameter vision encoder, enabling unified multimodal capabilities out of the box. As a “base” model (i.e., not fine-tuned for instruction or reasoning), it offers a flexible starting point for custom downstream tasks or fine-tuning. The model supports a large 256k token context window, making it capable of handling long documents or extended dialogues. Because it comes from the edge-optimized Ministral 3 family, it remains deployable on reasonably powerful hardware while offering a good balance between capability and resource use. Its multilingual and multimodal pretraining enables broad applicability across languages and tasks — from generation to classification to vision-language tasks.
Features
- 8.4 B parameter language backbone combined with a 0.4 B vision encoder for text + image support
- Dense “base” pretrained model — flexible for fine-tuning, specialization or custom tasks
- Large 256 k token context window for long-form input, documents, or extended dialogue contexts
- Multimodal capabilities: can process images and text together, enabling tasks like captioning, vision-language understanding, and more
- Multilingual support as part of the broader model family — useful for global or multilingual applications
- Edge-optimized design from the Ministral 3 series — balances model capability and resource use for local/in-house deployment
- Open-source under the Apache-2.0 license, enabling free use and modification in both commercial and research contexts
- Serves as a strong base for custom fine-tuning — ideal starting point for building specialized models (task-specific, domain-specific, multimodal, etc.)