vllm - ✅(Solved) Fix [Transformers v5] ColBERTJinaRobertaModel [2 pull requests, 3 comments, 3 participants]

hmellor · 2026-04-01T15:58:05Z

[vllm] PR 38883: Transformers v5 Skip ColBERT jina HF comparison due to remote code incompatibility - Repository: vllm-project/vllm - Author: Lidang-Jiang - St… # PR #38883: [Transformers v5] Skip ColBERT jina HF comparison due to remote code incompatibility - Repository: vllm-project/vllm - Author: Lidang-Jiang - State: closed | merged: False - Link: https://github.com/vllm-project/vllm/pull/38883 ## Description (problem / solution / changelog) ## Summary The `test_colbert_hf_comparison[jina]` test fails under Transformers v5 because `jinaai/jina-colbert-v2` uses custom remote code (`jinaai/xlm-roberta-flash-implementation`) with its own flash_attn integration that is incompatible with v5's dtype handling changes. **Root cause**: Transformers v5 deprecated `torch_dtype` in favor of `dtype`. The remote code model's custom `XLMRobertaFlashConfig` handles `torch_dtype` explicitly in `__init__`, but v5's `from_pretrained` may resolve the model dtype differently, causing flash_attn to receive float32 tensors (it only supports fp16/bf16), producing NaN outputs. Even when forced to bfloat16, the numerical results differ significantly from v4 (max_diff=0.21 vs tolerance=0.01). This skips the jina backend in `test_colbert_hf_comparison` when running under Transformers v5, until the model's remote code is updated for v5 compatibility. - **Not a duplicate**: No open PR addresses #38737. Verified with `gh pr list --search "38737"` and `gh pr list --search "ColBERTJina"`. - **AI assistance was used** (Claude) for root cause analysis. All changes reviewed by human submitter. Fixes #38737 Before (failure on main + Transformers v5.5.0.dev0) ``` $ CUDA_VISIBLE_DEVICES=3 python -m pytest tests/models/language/pooling/test_colbert.py::test_colbert_hf_comparison[jina] -v --timeout=600 tests/models/language/pooling/test_colbert.py::test_colbert_hf_comparison[jina] FAILED =================================== FAILURES =================================== _______________________ test_colbert_hf_comparison[jina] _______________________ tests/models/language/pooling/test_colbert.py:149: in _assert_embeddings_close torch.testing.assert_close( E AssertionError: Embedding mismatch for text 0 ======================= 1 failed in 43.07s ======================== ``` Diagnosis confirmed the HF reference model produces different embeddings: - Default loading (bf16 + flash_attn): `max_diff=0.21` vs tolerance `0.01` - Eager attention + float32: `max_diff=0.17` (different code path, still incompatible) - Model parameters load correctly (no NaN in weights, 1/294 near-zero params) - flash_attn assertion: `assert qkv.dtype in [torch.float16, torch.bfloat16]` fails when model loaded in float32 After (test correctly skipped) ``` $ CUDA_VISIBLE_DEVICES=2 python -m pytest tests/models/language/pooling/test_colbert.py::test_colbert_hf_comparison[jina] -v --timeout=60 tests/models/language/pooling/test_colbert.py::test_colbert_hf_comparison[jina] SKIPPED [100%] ======================= 1 skipped in 0.42s ======================== ``` ## Test plan - [x] `test_colbert_hf_comparison[jina]` correctly skipped under Transformers v5 - [x] Other ColBERT backends (bert, modernbert, lfm2) unaffected by this change - [x] `ruff check` and `ruff format` pass - [x] All pre-commit hooks pass ## Changed files - `tests/models/language/pooling/test_colbert.py` (modified, +11/-0) --- # PR #39176: fix(test): recompute Jina ColBERT rotary inv_freq cleared by transformers v5 weight loader - Repository: vllm-project/vllm - Author: ieBoytsov - State: closed | merged: True - Link: https://github.com/vllm-project/vllm/pull/39176 ## Description (problem / solution / changelog) ## Purpose ## Summary - Transformers 5.0 introduced a new weight materialization system that clears non-persistent buffers (`persistent=False`) during `from_pretrained` - The Jina ColBERT model's `RotaryEmbedding` registers `inv_freq` as a non-persistent buffer, so it gets wiped to uninitialized memory after weight loading - This caused NaN outputs in the `test_colbert_hf_comparison[jina]` HF reference model, failing the embedding comparison - Fix: recompute `inv_freq` for all rotary embedding modules after loading the HF model in `_load_hf_model` Fix: https://github.com/vllm-project/vllm/issues/38737 ## Test plan - [x] `pytest tests/models/language/pooling/test_colbert.py::test_colbert_hf_comparison[jina]` passes ## Test Result --- Essential Elements of an Effective PR Description Checklist - [ ] The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)". - [ ] The test plan, such as providing test command. - [ ] The test results, such as pasting the results comparison before and after, or e2e results - [ ] (Optional) The necessary documentation update, such as updating `supported_models.md` and `examples` for a new model. - [ ] (Optional) Release notes update. If you

vllm2026-04-01 15:58:05

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#38737•Fetched 2026-04-08 02:23:01

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Assignees

Timeline (top)

commented ×3cross-referenced ×2labeled ×2referenced ×2

Error Message

$ pytest tests/models/language/pooling/test_colbert.py::test_colbert_hf_comparison[jina] ... AssertionError: Embedding mismatch for text 0

PR fix notes

PR #38883: [Transformers v5] Skip ColBERT jina HF comparison due to remote code incompatibility

Repository: vllm-project/vllm
Author: Lidang-Jiang
State: closed | merged: False
Link: https://github.com/vllm-project/vllm/pull/38883

Description (problem / solution / changelog)

Summary

The test_colbert_hf_comparison[jina] test fails under Transformers v5 because jinaai/jina-colbert-v2 uses custom remote code (jinaai/xlm-roberta-flash-implementation) with its own flash_attn integration that is incompatible with v5's dtype handling changes.

Root cause: Transformers v5 deprecated torch_dtype in favor of dtype. The remote code model's custom XLMRobertaFlashConfig handles torch_dtype explicitly in __init__, but v5's from_pretrained may resolve the model dtype differently, causing flash_attn to receive float32 tensors (it only supports fp16/bf16), producing NaN outputs. Even when forced to bfloat16, the numerical results differ significantly from v4 (max_diff=0.21 vs tolerance=0.01).

This skips the jina backend in test_colbert_hf_comparison when running under Transformers v5, until the model's remote code is updated for v5 compatibility.

Not a duplicate: No open PR addresses #38737. Verified with gh pr list --search "38737" and gh pr list --search "ColBERTJina".
AI assistance was used (Claude) for root cause analysis. All changes reviewed by human submitter.

Fixes #38737

<details> <summary>Before (failure on main + Transformers v5.5.0.dev0)</summary>

$ CUDA_VISIBLE_DEVICES=3 python -m pytest tests/models/language/pooling/test_colbert.py::test_colbert_hf_comparison[jina] -v --timeout=600

tests/models/language/pooling/test_colbert.py::test_colbert_hf_comparison[jina] FAILED

=================================== FAILURES ===================================
_______________________ test_colbert_hf_comparison[jina] _______________________
tests/models/language/pooling/test_colbert.py:149: in _assert_embeddings_close
    torch.testing.assert_close(
E   AssertionError: Embedding mismatch for text 0

======================= 1 failed in 43.07s ========================

Diagnosis confirmed the HF reference model produces different embeddings:

Default loading (bf16 + flash_attn): max_diff=0.21 vs tolerance 0.01
Eager attention + float32: max_diff=0.17 (different code path, still incompatible)
Model parameters load correctly (no NaN in weights, 1/294 near-zero params)
flash_attn assertion: assert qkv.dtype in [torch.float16, torch.bfloat16] fails when model loaded in float32

</details> <details> <summary>After (test correctly skipped)</summary>

$ CUDA_VISIBLE_DEVICES=2 python -m pytest tests/models/language/pooling/test_colbert.py::test_colbert_hf_comparison[jina] -v --timeout=60

tests/models/language/pooling/test_colbert.py::test_colbert_hf_comparison[jina] SKIPPED [100%]

======================= 1 skipped in 0.42s ========================

</details>

Test plan

test_colbert_hf_comparison[jina] correctly skipped under Transformers v5
Other ColBERT backends (bert, modernbert, lfm2) unaffected by this change
ruff check and ruff format pass
All pre-commit hooks pass

Changed files

tests/models/language/pooling/test_colbert.py (modified, +11/-0)

PR #39176: fix(test): recompute Jina ColBERT rotary inv_freq cleared by transformers v5 weight loader

Repository: vllm-project/vllm
Author: ieBoytsov
State: closed | merged: True
Link: https://github.com/vllm-project/vllm/pull/39176

Description (problem / solution / changelog)

Purpose

Summary

Transformers 5.0 introduced a new weight materialization system that clears non-persistent buffers (persistent=False) during from_pretrained
The Jina ColBERT model's RotaryEmbedding registers inv_freq as a non-persistent buffer, so it gets wiped to uninitialized memory after weight loading
This caused NaN outputs in the test_colbert_hf_comparison[jina] HF reference model, failing the embedding comparison
Fix: recompute inv_freq for all rotary embedding modules after loading the HF model in _load_hf_model

Fix: https://github.com/vllm-project/vllm/issues/38737

Test plan

pytest tests/models/language/pooling/test_colbert.py::test_colbert_hf_comparison[jina] passes

Test Result

<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

</details>

Changed files

tests/models/language/pooling/test_colbert.py (modified, +8/-0)

Code Example

$ pytest tests/models/language/pooling/test_colbert.py::test_colbert_hf_comparison[jina]
...
AssertionError: Embedding mismatch for text 0

---

# Or your fork
git clone https://github.com/huggingface/transformers.git
git clone https://github.com/vllm-project/vllm.git

cd vllm
VLLM_USE_PRECOMPILED=1 uv pip install -e .
uv pip install -e ../transformers

RAW_BUFFERClick to expand / collapse

This is a sub-issue forming part of the work in https://github.com/vllm-project/vllm/issues/38379, please read the description of this issue before beginning to work on this one.

Which test is failing?

$ pytest tests/models/language/pooling/test_colbert.py::test_colbert_hf_comparison[jina]
...
AssertionError: Embedding mismatch for text 0

How to configure my environment?

It's very important that you install both vLLM and Transformers from source so that your test results reflect the current state of both libraries.

# Or your fork
git clone https://github.com/huggingface/transformers.git
git clone https://github.com/vllm-project/vllm.git

cd vllm
VLLM_USE_PRECOMPILED=1 uv pip install -e .
uv pip install -e ../transformers

extent analysis

TL;DR

The most likely fix involves ensuring that the embedding results from vLLM and Transformers are consistent, possibly by updating or adjusting the comparison logic in the test_colbert_hf_comparison test.

Guidance

Verify that both vLLM and Transformers are installed from source and that the installation process completes without errors, as described in the issue.
Check the test_colbert_hf_comparison test in test_colbert.py to ensure that the comparison logic for embeddings is correct and accounts for any potential differences in implementation between vLLM and Transformers.
Review the documentation and code for both libraries to understand how embeddings are generated and compared, which might help in identifying the mismatch cause.
Consider adding additional logging or debugging statements to the test to provide more insight into the embedding mismatch.

Example

No specific code example can be provided without modifying the existing test code, but reviewing the comparison logic in test_colbert_hf_comparison might look something like checking the embedding generation functions in both libraries:

# Hypothetical example, actual implementation may vary
def generate_embedding_vllm(text):
    # vLLM embedding generation logic
    pass

def generate_embedding_transformers(text):
    # Transformers embedding generation logic
    pass

# Comparison logic might need adjustment
if generate_embedding_vllm(text) != generate_embedding_transformers(text):
    raise AssertionError("Embedding mismatch")

Notes

The fix or workaround depends on the specific implementation details of both vLLM and Transformers, which are not fully provided in the issue. Ensuring that both libraries are up-to-date and installed correctly is crucial.

Recommendation

Apply workaround: Adjust the comparison logic in the test_colbert_hf_comparison test to account for potential differences in embedding generation between vLLM and Transformers, as this seems to be the most direct approach to resolving the embedding mismatch issue.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#installation #tensor shape #autograd error #model save/load #optimization

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - ✅(Solved) Fix [Transformers v5] ColBERTJinaRobertaModel [2 pull requests, 3 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

PR fix notes

PR #38883: [Transformers v5] Skip ColBERT jina HF comparison due to remote code incompatibility

Description (problem / solution / changelog)

Summary

Test plan

Changed files

PR #39176: fix(test): recompute Jina ColBERT rotary inv_freq cleared by transformers v5 weight loader

Description (problem / solution / changelog)

Purpose

Summary

Test plan

Test Result

Changed files

Code Example

Which test is failing?

How to configure my environment?

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

vllm - ✅(Solved) Fix [Transformers v5] ColBERTJinaRobertaModel [2 pull requests, 3 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

PR fix notes

PR #38883: [Transformers v5] Skip ColBERT jina HF comparison due to remote code incompatibility

Description (problem / solution / changelog)

Summary

Test plan

Changed files

PR #39176: fix(test): recompute Jina ColBERT rotary inv_freq cleared by transformers v5 weight loader

Description (problem / solution / changelog)

Purpose

Summary

Test plan

Test Result

Changed files

Code Example

Which test is failing?

How to configure my environment?

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING