vllm - ✅(Solved) Fix [Transformers v5] ColBERTJinaRobertaModel [2 pull requests, 3 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#38737Fetched 2026-04-08 02:23:01
View on GitHub
Comments
3
Participants
3
Timeline
14
Reactions
0
Author
Assignees
Timeline (top)
commented ×3cross-referenced ×2labeled ×2referenced ×2

Error Message

$ pytest tests/models/language/pooling/test_colbert.py::test_colbert_hf_comparison[jina] ... AssertionError: Embedding mismatch for text 0

PR fix notes

PR #38883: [Transformers v5] Skip ColBERT jina HF comparison due to remote code incompatibility

Description (problem / solution / changelog)

Summary

The test_colbert_hf_comparison[jina] test fails under Transformers v5 because jinaai/jina-colbert-v2 uses custom remote code (jinaai/xlm-roberta-flash-implementation) with its own flash_attn integration that is incompatible with v5's dtype handling changes.

Root cause: Transformers v5 deprecated torch_dtype in favor of dtype. The remote code model's custom XLMRobertaFlashConfig handles torch_dtype explicitly in __init__, but v5's from_pretrained may resolve the model dtype differently, causing flash_attn to receive float32 tensors (it only supports fp16/bf16), producing NaN outputs. Even when forced to bfloat16, the numerical results differ significantly from v4 (max_diff=0.21 vs tolerance=0.01).

This skips the jina backend in test_colbert_hf_comparison when running under Transformers v5, until the model's remote code is updated for v5 compatibility.

  • Not a duplicate: No open PR addresses #38737. Verified with gh pr list --search "38737" and gh pr list --search "ColBERTJina".
  • AI assistance was used (Claude) for root cause analysis. All changes reviewed by human submitter.

Fixes #38737

<details> <summary>Before (failure on main + Transformers v5.5.0.dev0)</summary>
$ CUDA_VISIBLE_DEVICES=3 python -m pytest tests/models/language/pooling/test_colbert.py::test_colbert_hf_comparison[jina] -v --timeout=600

tests/models/language/pooling/test_colbert.py::test_colbert_hf_comparison[jina] FAILED

=================================== FAILURES ===================================
_______________________ test_colbert_hf_comparison[jina] _______________________
tests/models/language/pooling/test_colbert.py:149: in _assert_embeddings_close
    torch.testing.assert_close(
E   AssertionError: Embedding mismatch for text 0

======================= 1 failed in 43.07s ========================

Diagnosis confirmed the HF reference model produces different embeddings:

  • Default loading (bf16 + flash_attn): max_diff=0.21 vs tolerance 0.01
  • Eager attention + float32: max_diff=0.17 (different code path, still incompatible)
  • Model parameters load correctly (no NaN in weights, 1/294 near-zero params)
  • flash_attn assertion: assert qkv.dtype in [torch.float16, torch.bfloat16] fails when model loaded in float32
</details> <details> <summary>After (test correctly skipped)</summary>
$ CUDA_VISIBLE_DEVICES=2 python -m pytest tests/models/language/pooling/test_colbert.py::test_colbert_hf_comparison[jina] -v --timeout=60

tests/models/language/pooling/test_colbert.py::test_colbert_hf_comparison[jina] SKIPPED [100%]

======================= 1 skipped in 0.42s ========================
</details>

Test plan

  • test_colbert_hf_comparison[jina] correctly skipped under Transformers v5
  • Other ColBERT backends (bert, modernbert, lfm2) unaffected by this change
  • ruff check and ruff format pass
  • All pre-commit hooks pass

Changed files

  • tests/models/language/pooling/test_colbert.py (modified, +11/-0)

PR #39176: fix(test): recompute Jina ColBERT rotary inv_freq cleared by transformers v5 weight loader

Description (problem / solution / changelog)

Purpose

Summary

  • Transformers 5.0 introduced a new weight materialization system that clears non-persistent buffers (persistent=False) during from_pretrained
  • The Jina ColBERT model's RotaryEmbedding registers inv_freq as a non-persistent buffer, so it gets wiped to uninitialized memory after weight loading
  • This caused NaN outputs in the test_colbert_hf_comparison[jina] HF reference model, failing the embedding comparison
  • Fix: recompute inv_freq for all rotary embedding modules after loading the HF model in _load_hf_model

Fix: https://github.com/vllm-project/vllm/issues/38737

Test plan

  • pytest tests/models/language/pooling/test_colbert.py::test_colbert_hf_comparison[jina] passes

Test Result


<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.
</details>

Changed files

  • tests/models/language/pooling/test_colbert.py (modified, +8/-0)

Code Example

$ pytest tests/models/language/pooling/test_colbert.py::test_colbert_hf_comparison[jina]
...
AssertionError: Embedding mismatch for text 0

---

# Or your fork
git clone https://github.com/huggingface/transformers.git
git clone https://github.com/vllm-project/vllm.git

cd vllm
VLLM_USE_PRECOMPILED=1 uv pip install -e .
uv pip install -e ../transformers
RAW_BUFFERClick to expand / collapse

This is a sub-issue forming part of the work in https://github.com/vllm-project/vllm/issues/38379, please read the description of this issue before beginning to work on this one.

Which test is failing?

$ pytest tests/models/language/pooling/test_colbert.py::test_colbert_hf_comparison[jina]
...
AssertionError: Embedding mismatch for text 0

How to configure my environment?

It's very important that you install both vLLM and Transformers from source so that your test results reflect the current state of both libraries.

# Or your fork
git clone https://github.com/huggingface/transformers.git
git clone https://github.com/vllm-project/vllm.git

cd vllm
VLLM_USE_PRECOMPILED=1 uv pip install -e .
uv pip install -e ../transformers

extent analysis

TL;DR

The most likely fix involves ensuring that the embedding results from vLLM and Transformers are consistent, possibly by updating or adjusting the comparison logic in the test_colbert_hf_comparison test.

Guidance

  • Verify that both vLLM and Transformers are installed from source and that the installation process completes without errors, as described in the issue.
  • Check the test_colbert_hf_comparison test in test_colbert.py to ensure that the comparison logic for embeddings is correct and accounts for any potential differences in implementation between vLLM and Transformers.
  • Review the documentation and code for both libraries to understand how embeddings are generated and compared, which might help in identifying the mismatch cause.
  • Consider adding additional logging or debugging statements to the test to provide more insight into the embedding mismatch.

Example

No specific code example can be provided without modifying the existing test code, but reviewing the comparison logic in test_colbert_hf_comparison might look something like checking the embedding generation functions in both libraries:

# Hypothetical example, actual implementation may vary
def generate_embedding_vllm(text):
    # vLLM embedding generation logic
    pass

def generate_embedding_transformers(text):
    # Transformers embedding generation logic
    pass

# Comparison logic might need adjustment
if generate_embedding_vllm(text) != generate_embedding_transformers(text):
    raise AssertionError("Embedding mismatch")

Notes

The fix or workaround depends on the specific implementation details of both vLLM and Transformers, which are not fully provided in the issue. Ensuring that both libraries are up-to-date and installed correctly is crucial.

Recommendation

Apply workaround: Adjust the comparison logic in the test_colbert_hf_comparison test to account for potential differences in embedding generation between vLLM and Transformers, as this seems to be the most direct approach to resolving the embedding mismatch issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING