ollama - 💡(How to fix) Fix Add Support for Sarvam-30B & Sarvam-105B Models

Root Cause

I'd like to request official Ollama support for Sarvam-30B and Sarvam-105B — open-weight multilingual reasoning models released by Sarvam AI built on sparse Mixture-of-Experts (MoE) architectures. Both models are publicly available on Hugging Face under open weights.

Summary

Motivation

The Ollama ecosystem currently skews heavily toward English-centric models. Sarvam-30B and 105B address a meaningful gap: high-quality, locally deployable models for Indic languages, Hinglish, and code-mixed workloads — spoken by over a billion people who are significantly underrepresented in local AI tooling.

These models are not niche regional experiments. Their benchmark performance is competitive with significantly larger open-weight models, making them a strong addition for multilingual, agentic, and reasoning-heavy use cases. Adding them to Ollama would:

Meaningfully expand non-English local inference coverage
Support Indian developers, multilingual enterprises, and regional AI deployments
Provide access to state-of-the-art MoE inference at dramatically lower compute cost than equivalent dense models
Strengthen Ollama's position as the go-to platform for sovereign, local AI deployments globally

Model Architecture

Both models use sparse MoE with 128 experts and auxiliary-loss-free load balancing, meaning only a small subset of parameters are active per token — inference costs far below models of equivalent total parameter count.

Sarvam-30B

Property	Value
Total parameters	30B
Active params/token	~2.4B
Context length	32K
Attention	GQA + grouped KV heads
RoPE theta	8e6 (long-context stable)
Experts	128 (sparse routing)

Designed for: high-throughput inference, practical local deployment, lower memory usage, multilingual understanding.

Sarvam-105B

Property	Value
Total parameters	105B
Active params/token	~10.3B
Context length	128K
Attention	MLA (decoupled QK dims)
Routing	Sigmoid-based sparse routing
Experts	128 (sparse routing)

Designed for: advanced reasoning, coding workflows, agentic systems, long-document processing, sovereign AI deployments.

Benchmark Highlights (Sarvam-105B)

Benchmark	Score
Math500	98.6
HMMT Feb 2025	85.8
GPQA Diamond	78.7
LiveCodeBench v6	71.7
Tau2 Bench	68.3
SWE-Bench Verified	45.0
BrowseComp	49.5

These results are competitive with significantly larger open-weight models, achieved with only ~10.3B active parameters per token due to sparse MoE routing.

Proposed Integration Path

Convert weights to GGUF format via llama.cpp
Apply MoE quantization (Q4_K_M, Q5_K_M, Q8_0) using llama.cpp's existing sparse MoE support
Add a Modelfile for each variant with appropriate defaults (num_ctx, temperature, etc.)
Publish quantized versions to the Ollama model registry

Both models use sparse MoE architectures already compatible with llama.cpp's MoE inference backend, making GGUF conversion a well-trodden path. The sparse activation design means quantized variants should achieve excellent tokens/sec relative to their nominal parameter counts.

Cloud Support for Sarvam-105B

Beyond local deployment, it would be fantastic to see Sarvam-105B supported as a cloud-hosted model within Ollama's cloud offering as well.

Given its scale (105B total parameters, 128K context), Sarvam-105B is not always practical for consumer hardware — but it is precisely the kind of model that benefits enormously from managed cloud inference:

Users who need frontier-tier multilingual reasoning without owning high-end GPU clusters
Enterprises building Indic-language pipelines that require 128K long-context support at scale
Developers prototyping agentic workflows who want to evaluate the model before committing to local infrastructure
Regional AI deployments that need cloud-grade reliability with a model tuned for their language ecosystem

Cloud support for Sarvam-105B would make it accessible to the widest possible audience — not just those with the hardware to run it locally — and would meaningfully differentiate Ollama's cloud catalog with a strong multilingual, non-English-centric option.

Community Impact

This is not a marginal improvement request — it's a request to extend Ollama's reach to underserved linguistic communities at scale. Indic languages collectively represent hundreds of millions of potential users of local AI tooling who currently have no strong locally-deployable model option within Ollama. Sarvam-30B and 105B directly address that gap.

Happy to assist with testing, Modelfile configuration, or any community-driven GGUF conversion effort if that would be helpful.

If this resonates with you, please drop a 👍 on this issue — it helps the maintainers gauge community demand.

Related links:

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering