ollama - 💡(How to fix) Fix Add Support for Sarvam-30B & Sarvam-105B Models

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

I'd like to request official Ollama support for Sarvam-30B and Sarvam-105B — open-weight multilingual reasoning models released by Sarvam AI built on sparse Mixture-of-Experts (MoE) architectures. Both models are publicly available on Hugging Face under open weights.


Root Cause

I'd like to request official Ollama support for Sarvam-30B and Sarvam-105B — open-weight multilingual reasoning models released by Sarvam AI built on sparse Mixture-of-Experts (MoE) architectures. Both models are publicly available on Hugging Face under open weights.


RAW_BUFFERClick to expand / collapse

Summary

I'd like to request official Ollama support for Sarvam-30B and Sarvam-105B — open-weight multilingual reasoning models released by Sarvam AI built on sparse Mixture-of-Experts (MoE) architectures. Both models are publicly available on Hugging Face under open weights.


Motivation

The Ollama ecosystem currently skews heavily toward English-centric models. Sarvam-30B and 105B address a meaningful gap: high-quality, locally deployable models for Indic languages, Hinglish, and code-mixed workloads — spoken by over a billion people who are significantly underrepresented in local AI tooling.

These models are not niche regional experiments. Their benchmark performance is competitive with significantly larger open-weight models, making them a strong addition for multilingual, agentic, and reasoning-heavy use cases. Adding them to Ollama would:

  • Meaningfully expand non-English local inference coverage
  • Support Indian developers, multilingual enterprises, and regional AI deployments
  • Provide access to state-of-the-art MoE inference at dramatically lower compute cost than equivalent dense models
  • Strengthen Ollama's position as the go-to platform for sovereign, local AI deployments globally

Model Architecture

Both models use sparse MoE with 128 experts and auxiliary-loss-free load balancing, meaning only a small subset of parameters are active per token — inference costs far below models of equivalent total parameter count.

Sarvam-30B

PropertyValue
Total parameters30B
Active params/token~2.4B
Context length32K
AttentionGQA + grouped KV heads
RoPE theta8e6 (long-context stable)
Experts128 (sparse routing)

Designed for: high-throughput inference, practical local deployment, lower memory usage, multilingual understanding.

Sarvam-105B

PropertyValue
Total parameters105B
Active params/token~10.3B
Context length128K
AttentionMLA (decoupled QK dims)
RoutingSigmoid-based sparse routing
Experts128 (sparse routing)

Designed for: advanced reasoning, coding workflows, agentic systems, long-document processing, sovereign AI deployments.


Benchmark Highlights (Sarvam-105B)

BenchmarkScore
Math50098.6
HMMT Feb 202585.8
GPQA Diamond78.7
LiveCodeBench v671.7
Tau2 Bench68.3
SWE-Bench Verified45.0
BrowseComp49.5

These results are competitive with significantly larger open-weight models, achieved with only ~10.3B active parameters per token due to sparse MoE routing.


Proposed Integration Path

  • Convert weights to GGUF format via llama.cpp
  • Apply MoE quantization (Q4_K_M, Q5_K_M, Q8_0) using llama.cpp's existing sparse MoE support
  • Add a Modelfile for each variant with appropriate defaults (num_ctx, temperature, etc.)
  • Publish quantized versions to the Ollama model registry

Both models use sparse MoE architectures already compatible with llama.cpp's MoE inference backend, making GGUF conversion a well-trodden path. The sparse activation design means quantized variants should achieve excellent tokens/sec relative to their nominal parameter counts.


Cloud Support for Sarvam-105B

Beyond local deployment, it would be fantastic to see Sarvam-105B supported as a cloud-hosted model within Ollama's cloud offering as well.

Given its scale (105B total parameters, 128K context), Sarvam-105B is not always practical for consumer hardware — but it is precisely the kind of model that benefits enormously from managed cloud inference:

  • Users who need frontier-tier multilingual reasoning without owning high-end GPU clusters
  • Enterprises building Indic-language pipelines that require 128K long-context support at scale
  • Developers prototyping agentic workflows who want to evaluate the model before committing to local infrastructure
  • Regional AI deployments that need cloud-grade reliability with a model tuned for their language ecosystem

Cloud support for Sarvam-105B would make it accessible to the widest possible audience — not just those with the hardware to run it locally — and would meaningfully differentiate Ollama's cloud catalog with a strong multilingual, non-English-centric option.


Community Impact

This is not a marginal improvement request — it's a request to extend Ollama's reach to underserved linguistic communities at scale. Indic languages collectively represent hundreds of millions of potential users of local AI tooling who currently have no strong locally-deployable model option within Ollama. Sarvam-30B and 105B directly address that gap.

Happy to assist with testing, Modelfile configuration, or any community-driven GGUF conversion effort if that would be helpful.

If this resonates with you, please drop a 👍 on this issue — it helps the maintainers gauge community demand.


Related links:

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

ollama - 💡(How to fix) Fix Add Support for Sarvam-30B & Sarvam-105B Models