Voxtral Models
Choose the perfect Voxtral model for your speech AI needs. From production-scale deployments to local development, Voxtral offers flexible solutions with state-of-the-art performance.
Voxtral Small
ProductionVoxtral Small is our flagship 24-billion parameter model designed for production-scale voice intelligence applications. With superior accuracy and comprehensive multilingual support, Voxtral Small delivers enterprise-grade performance for demanding use cases.
Model Specifications
Performance
Deployment Requirements
Voxtral Mini 1.0
LocalVoxtral Mini 1.0 is our compact 3-billion parameter model optimized for local, edge, and laptop deployments. Perfect for developers, researchers, and applications requiring privacy and offline capabilities while maintaining excellent performance.
Model Specifications
Performance
Deployment Requirements
Performance Benchmarks
Voxtral models achieve state-of-the-art results across multiple benchmarks, consistently outperforming leading speech recognition models in accuracy, multilingual support, and audio understanding.
English Word Error Rate (WER)
Lower WER indicates better accuracy. Voxtral Small achieves state-of-the-art results, while Voxtral Mini provides excellent performance for local deployment.
Multilingual Performance (FLEURS Dataset)
Voxtral outperforms Whisper on all measured languages, demonstrating superior multilingual capabilities across diverse linguistic families.
Audio Understanding (40-example AU Benchmark)
Voxtral Small
Matches GPT-4o-mini and Gemini 2.5 Flash performance
Voxtral Mini
Excellent performance for local deployment scenarios
Built-in Q&A
Direct question answering without additional LLM chaining
Audio understanding capabilities enable direct Q&A over audio content, automatic summarization, and semantic analysis without external dependencies.
Speed & Efficiency Comparison
Voxtral API
50% cheaper than Whisper API
Voxtral Mini Local
No API latency, instant processing
Context Window
~30 min audio or 40 min comprehension
Memory Efficiency
Efficient memory usage for edge deployment
Voxtral offers superior cost efficiency and speed compared to closed-source alternatives, with the flexibility of local deployment for privacy-sensitive applications.
Choose Your Model
Select the perfect Voxtral model based on your specific requirements, deployment environment, and performance needs.
Choose Voxtral Small When:
- Building production-scale voice applications
- Requiring maximum accuracy and performance
- Processing high volumes of audio content
- Using cloud infrastructure with multiple GPUs
- Needing enterprise-grade reliability
- Working with complex multilingual content
Choose Voxtral Mini When:
- Developing prototypes and proof-of-concepts
- Requiring local/offline processing capabilities
- Working with limited computational resources
- Building privacy-sensitive applications
- Deploying on edge devices or laptops
- Learning and experimenting with speech AI
Deployment Guide
Get started with Voxtral deployment using our comprehensive guides and code examples.
Ready to Get Started with Voxtral?
Join thousands of developers building the future of speech AI with Voxtral's open-source models.