ollama - ✅(Solved) Fix feat: add XLM-R embedding support + SentencePiece Unigram tokenizer + embedding model fixes [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#15620Fetched 2026-04-17 08:27:15
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Participants
Timeline (top)
cross-referenced ×1

Fix Action

Fixed

PR fix notes

PR #15621: fix: nil-guard optional embedding components + use exact GELU for BERT

Description (problem / solution / changelog)

Summary

Three small, backwards-compatible fixes for embedding models:

  1. Qwen3 QK-norm nil-guard (model/models/qwen3/model.go): Qwen3 attention unconditionally calls QueryNorm.Forward() and KeyNorm.Forward(), causing nil pointer panic for models that don't have QK-norm tensors (e.g. Jina v5 Nano and other Qwen-based models without QK-norm).

  2. Gemma3 Dense nil-guard (model/models/gemma3/embed.go): Gemma3 embed model unconditionally calls dense.Forward(), causing nil pointer panic for models without Dense projection layers (e.g. Harrier-270M).

  3. BERT GELU_ERF (model/models/bert/embed.go): BERT was trained with exact GELU (erf-based), but the implementation uses GELU() which is the tanh approximation. Switching to GELU_ERF() improves cosine similarity vs HuggingFace from 0.996 to 1.000.

Test plan

  • Verified all-MiniLM-L6-v2 Q8_0 embeddings match HuggingFace at cos=0.9998
  • Verified gte-small, arctic-embed-xs match at cos>0.999
  • Jina v5 Nano (Qwen3 without QK-norm) loads and produces embeddings
  • Harrier-270M (Gemma3 without Dense) loads and produces embeddings

Related issue: #15620

🤖 Generated with Claude Code

Changed files

  • model/models/bert/embed.go (modified, +2/-1)
  • model/models/gemma3/embed.go (modified, +3/-1)
  • model/models/qwen3/model.go (modified, +6/-2)
RAW_BUFFERClick to expand / collapse

Problem

Ollama currently supports BERT and nomic-bert for embedding models, but several popular embedding model families are missing or have compatibility issues:

  1. XLM-R models (multilingual-e5-small, arctic-embed-l-v2, PIXIE-Rune) cannot be loaded — there's no xlmr architecture.
  2. SentencePiece Unigram tokenizer is missing — the existing SentencePiece tokenizer uses BPE-style pairwise merge, which produces incorrect tokenization for Unigram models. This affects multilingual models.
  3. BERT GELU variant — BERT uses exact GELU (erf-based) but the current implementation uses the tanh approximation, causing cos similarity to drop from 1.0 to ~0.996 vs HuggingFace.
  4. Nil-pointer crashes — Qwen3 models without QK-norm tensors (e.g. Jina v5) crash; Gemma3 embed models without Dense projection layers (e.g. Harrier-270M) crash.

Proposed Changes

1. Bugfixes (backwards compatible, no new dependencies)

  • model/models/qwen3/model.go: nil-guard for optional QK-norm (fixes crash with Jina v5 Nano and other Qwen-based models without QK-norm)
  • model/models/gemma3/embed.go: nil-guard for optional Dense projection layers (fixes crash with Harrier-270M)
  • model/models/bert/embed.go: use GELU_ERF instead of GELU (matches HuggingFace's exact GELU, improves cos from 0.996 to 1.000)

2. SentencePiece Unigram tokenizer

  • New tokenizer/sentencepiece_unigram.go: Viterbi DP tokenizer for SentencePiece Unigram models. The existing SentencePiece tokenizer uses greedy pairwise merge which is correct for BPE but wrong for Unigram models.

3. XLM-R embedding architecture

  • New model/models/xlmr/embed.go: XLM-RoBERTa encoder (like BERT but without type embeddings, with position offset, SentencePiece tokenizer)
  • model/models/bert/embed.go: extend tokenizer support to also accept "llama" (SentencePiece) and "gpt2" (BPE) in addition to "bert" (WordPiece)
  • model/models/models.go: register xlmr
  • fs/ggml/ggml.go: add xlmr to OllamaEngineRequired

Testing

Verified against HuggingFace sentence-transformers with 13 embedding models:

ModelArchQ8_0 cos vs HF
all-MiniLM-L6-v2BERT0.9998
gte-smallBERT0.9999
arctic-embed-xsBERT0.9999
multilingual-e5-smallBERT+SP0.9999
arctic-embed-l-v2XLM-Rloads, L2-norm=1.0
PIXIE-Rune-v1XLM-Rcross-lingual OK
harrier-270mGemma3loads, L2-norm=1.0
jina-v5-nanoQwen3loads, L2-norm=1.0
+ 5 more Qwen3 modelsQwen3all pass

All GGUFs available at huggingface.co/cstr.

Implementation

Branch: https://github.com/CrispStrobe/ollama/tree/feat/xlmr-embedding (7 files changed, 461 insertions, 4 deletions)

Happy to split into separate PRs (bugfixes / tokenizer / architecture) if preferred.

extent analysis

TL;DR

The most likely fix involves implementing bugfixes, adding a SentencePiece Unigram tokenizer, and introducing an XLM-R embedding architecture to address compatibility issues with various embedding models.

Guidance

  • Apply the proposed bugfixes in model/models/qwen3/model.go and model/models/gemma3/embed.go to prevent nil-pointer crashes.
  • Implement the new SentencePiece Unigram tokenizer in tokenizer/sentencepiece_unigram.go to correctly handle Unigram models.
  • Introduce the XLM-R embedding architecture in model/models/xlmr/embed.go to support XLM-R models.
  • Verify the fixes by testing against HuggingFace sentence-transformers with various embedding models, checking for correct loading and cosine similarity.

Example

// model/models/qwen3/model.go
if qkNorm != nil {
    // existing code
} else {
    // handle nil qkNorm to prevent crash
}

// tokenizer/sentencepiece_unigram.go
// implement Viterbi DP tokenizer for SentencePiece Unigram models

Notes

The proposed changes are backwards compatible and do not introduce new dependencies. However, it is recommended to split the changes into separate PRs for bugfixes, tokenizer, and architecture to facilitate review and testing.

Recommendation

Apply the workaround by implementing the proposed changes, as they address specific compatibility issues with various embedding models and have been verified against HuggingFace sentence-transformers.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING