ollama - ✅(Solved) Fix feat: add XLM-R embedding support + SentencePiece Unigram tokenizer + embedding model fixes [1 pull requests, 1 participants]

ollama2026-04-16 04:48:50

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

ollama/ollama#15620•Fetched 2026-04-17 08:27:15

View on GitHub

Comments

Participants

Timeline

Reactions

Author

CrispStrobe

Participants

CrispStrobe

Timeline (top)

cross-referenced ×1

Fix Action

Fixed

Fixed by PR: fix: nil-guard optional embedding components + use exact GELU for BERT (https://github.com/ollama/ollama/pull/15621)

PR fix notes

PR #15621: fix: nil-guard optional embedding components + use exact GELU for BERT

Repository: ollama/ollama
Author: CrispStrobe
State: open | merged: False
Link: https://github.com/ollama/ollama/pull/15621

Description (problem / solution / changelog)

Summary

Three small, backwards-compatible fixes for embedding models:

Qwen3 QK-norm nil-guard (model/models/qwen3/model.go): Qwen3 attention unconditionally calls QueryNorm.Forward() and KeyNorm.Forward(), causing nil pointer panic for models that don't have QK-norm tensors (e.g. Jina v5 Nano and other Qwen-based models without QK-norm).
Gemma3 Dense nil-guard (model/models/gemma3/embed.go): Gemma3 embed model unconditionally calls dense.Forward(), causing nil pointer panic for models without Dense projection layers (e.g. Harrier-270M).
BERT GELU_ERF (model/models/bert/embed.go): BERT was trained with exact GELU (erf-based), but the implementation uses GELU() which is the tanh approximation. Switching to GELU_ERF() improves cosine similarity vs HuggingFace from 0.996 to 1.000.

Test plan

Verified all-MiniLM-L6-v2 Q8_0 embeddings match HuggingFace at cos=0.9998
Verified gte-small, arctic-embed-xs match at cos>0.999
Jina v5 Nano (Qwen3 without QK-norm) loads and produces embeddings
Harrier-270M (Gemma3 without Dense) loads and produces embeddings

Related issue: #15620

🤖 Generated with Claude Code

Changed files

model/models/bert/embed.go (modified, +2/-1)
model/models/gemma3/embed.go (modified, +3/-1)
model/models/qwen3/model.go (modified, +6/-2)

RAW_BUFFERClick to expand / collapse

Problem

Ollama currently supports BERT and nomic-bert for embedding models, but several popular embedding model families are missing or have compatibility issues:

XLM-R models (multilingual-e5-small, arctic-embed-l-v2, PIXIE-Rune) cannot be loaded — there's no xlmr architecture.
SentencePiece Unigram tokenizer is missing — the existing SentencePiece tokenizer uses BPE-style pairwise merge, which produces incorrect tokenization for Unigram models. This affects multilingual models.
BERT GELU variant — BERT uses exact GELU (erf-based) but the current implementation uses the tanh approximation, causing cos similarity to drop from 1.0 to ~0.996 vs HuggingFace.
Nil-pointer crashes — Qwen3 models without QK-norm tensors (e.g. Jina v5) crash; Gemma3 embed models without Dense projection layers (e.g. Harrier-270M) crash.

Proposed Changes

1. Bugfixes (backwards compatible, no new dependencies)

model/models/qwen3/model.go: nil-guard for optional QK-norm (fixes crash with Jina v5 Nano and other Qwen-based models without QK-norm)
model/models/gemma3/embed.go: nil-guard for optional Dense projection layers (fixes crash with Harrier-270M)
model/models/bert/embed.go: use GELU_ERF instead of GELU (matches HuggingFace's exact GELU, improves cos from 0.996 to 1.000)

2. SentencePiece Unigram tokenizer

New tokenizer/sentencepiece_unigram.go: Viterbi DP tokenizer for SentencePiece Unigram models. The existing SentencePiece tokenizer uses greedy pairwise merge which is correct for BPE but wrong for Unigram models.

3. XLM-R embedding architecture

New model/models/xlmr/embed.go: XLM-RoBERTa encoder (like BERT but without type embeddings, with position offset, SentencePiece tokenizer)
model/models/bert/embed.go: extend tokenizer support to also accept "llama" (SentencePiece) and "gpt2" (BPE) in addition to "bert" (WordPiece)
model/models/models.go: register xlmr
fs/ggml/ggml.go: add xlmr to OllamaEngineRequired

Testing

Verified against HuggingFace sentence-transformers with 13 embedding models:

Model	Arch	Q8_0 cos vs HF
all-MiniLM-L6-v2	BERT	0.9998
gte-small	BERT	0.9999
arctic-embed-xs	BERT	0.9999
multilingual-e5-small	BERT+SP	0.9999
arctic-embed-l-v2	XLM-R	loads, L2-norm=1.0
PIXIE-Rune-v1	XLM-R	cross-lingual OK
harrier-270m	Gemma3	loads, L2-norm=1.0
jina-v5-nano	Qwen3	loads, L2-norm=1.0
+ 5 more Qwen3 models	Qwen3	all pass

All GGUFs available at huggingface.co/cstr.

Implementation

Branch: https://github.com/CrispStrobe/ollama/tree/feat/xlmr-embedding (7 files changed, 461 insertions, 4 deletions)

Happy to split into separate PRs (bugfixes / tokenizer / architecture) if preferred.

extent analysis

TL;DR

The most likely fix involves implementing bugfixes, adding a SentencePiece Unigram tokenizer, and introducing an XLM-R embedding architecture to address compatibility issues with various embedding models.

Guidance

Apply the proposed bugfixes in model/models/qwen3/model.go and model/models/gemma3/embed.go to prevent nil-pointer crashes.
Implement the new SentencePiece Unigram tokenizer in tokenizer/sentencepiece_unigram.go to correctly handle Unigram models.
Introduce the XLM-R embedding architecture in model/models/xlmr/embed.go to support XLM-R models.
Verify the fixes by testing against HuggingFace sentence-transformers with various embedding models, checking for correct loading and cosine similarity.

Example

// model/models/qwen3/model.go
if qkNorm != nil {
    // existing code
} else {
    // handle nil qkNorm to prevent crash
}

// tokenizer/sentencepiece_unigram.go
// implement Viterbi DP tokenizer for SentencePiece Unigram models

Notes

The proposed changes are backwards compatible and do not introduce new dependencies. However, it is recommended to split the changes into separate PRs for bugfixes, tokenizer, and architecture to facilitate review and testing.

Recommendation

Apply the workaround by implementing the proposed changes, as they address specific compatibility issues with various embedding models and have been verified against HuggingFace sentence-transformers.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#inference speed #output truncation #response parsing #generation error #database connection

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

ollama - ✅(Solved) Fix feat: add XLM-R embedding support + SentencePiece Unigram tokenizer + embedding model fixes [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #15621: fix: nil-guard optional embedding components + use exact GELU for BERT

Description (problem / solution / changelog)

Summary

Test plan

Changed files

Problem

Proposed Changes

1. Bugfixes (backwards compatible, no new dependencies)

2. SentencePiece Unigram tokenizer

3. XLM-R embedding architecture

Testing

Implementation

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

ollama - ✅(Solved) Fix feat: add XLM-R embedding support + SentencePiece Unigram tokenizer + embedding model fixes [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #15621: fix: nil-guard optional embedding components + use exact GELU for BERT

Description (problem / solution / changelog)

Summary

Test plan

Changed files

Problem

Proposed Changes

1. Bugfixes (backwards compatible, no new dependencies)

2. SentencePiece Unigram tokenizer

3. XLM-R embedding architecture

Testing

Implementation

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING