
Sarvam AI, a rising Indian startup, has released its most ambitious tool yet, a large language model designed for complex tasks and local languages. Called Sarvam-M, the model features 24 billion parameters and is fully open-source. Built on top of the Mistral Small architecture, Sarvam-M targets hard problems like programming and mathematics, along with regional language support.
The company claims Sarvam-M can outperform well-known models like Llama-4 Scout, especially on Indian benchmarks. This matters because most global AI tools lag in supporting local languages or reasoning in low-resource settings. “We wanted to build a model that thinks like us and speaks like us,” said a Sarvam team lead during a demo session.
The model’s strong performance, especially in Indian language reasoning tests, could make it valuable for education, translation, and other everyday applications.
A Hybrid Model Built to Think and Converse
Sarvam-M is not just another AI chatbot. It’s built to solve real problems across languages, code, and classroom tasks. At its core is a hybrid architecture based on Mistral Small, a known open-weight model. But what sets it apart is its three-step development process: Supervised Fine-Tuning (SFT), Reinforcement Learning with Verifiable Rewards (RLVR), and Inference Optimisations. For SFT, the team designed complex prompt sets and filtered completions using custom scoring tools.
This helped Sarvam-M learn two separate modes, one for reasoning, the other for everyday conversation. RLVR added a new layer of depth. Developers used programming and math datasets, along with reward engineering, to guide the model’s learning. Inference was also streamlined. Using FP8 precision and techniques like lookahead decoding, Sarvam-M runs faster with minimal drop in accuracy. The team noted some issues with supporting multiple users at once, but said workarounds are being tested.
Outshining Peers, But Not Without Trade-Offs
Performance-wise, Sarvam-M delivers. In benchmarks combining Indian languages with math reasoning, like the romanised GSM-8K test, it saw an 86% improvement. That’s a huge step forward, especially in a space where most tools underperform on mixed-language reasoning. The AI model also beats Llama-4 Scout in most tests and holds up well against giants like Llama 3.3 (70B) and Gemma 3 (27B). However, Sarvam-M lags slightly, about 1% in English knowledge benchmarks like MMLU.
That drop is intentional, according to Sarvam’s blog. The team focused on local relevance and reasoning tasks rather than chasing scores on Western benchmarks. Still, the challenges remain. High concurrency is a known issue. The company is also monitoring potential bias and cultural sensitivity. Post-training adjustments during SFT tried to address this by filtering for relevance and fairness. Sarvam plans to continue improving the model with user feedback and possible multilingual fine-tuning. More open releases are expected in the coming months.
Could India’s LLMs Set a New Global Standard?
Sarvam-M signals a turning point in India’s AI ambitions. It’s not just about matching global models, it’s about building tools that work for India. With stronger math skills and better language support, this model could change how millions access education and automation.
Yet questions remain. How do we make sure these tools stay fair and useful as they scale? Who sets the ethical boundaries when models grow this powerful? As more countries push to build their own AI systems, Sarvam-M could become a playbook. One that shows performance doesn’t have to come at the cost of local relevance.