vllm - ✅(Solved) Fix [Transformers v5] Base model and LoRA used in test has incorrect `tokenizer_config.json` [1 pull requests, 5 comments, 4 participants]

vllm2026-03-27 18:43:30

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#38386•Fetched 2026-04-08 01:41:39

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Assignees

Timeline (top)

commented ×5mentioned ×5subscribed ×5labeled ×2

Error Message

$ pytest tests/lora/test_quant_model.py::test_quant_model_lora[model0] ... AssertionError: assert ['#f07733: A ...#f08800: A v'] == ['#f07700: A ...#f00000: A v'] [2026-03-27T01:15:06Z] [2026-03-27T01:15:06Z] At index 0 diff: '#f07733: A v' != '#f07700: A v' [2026-03-27T01:15:06Z] [2026-03-27T01:15:06Z] Full diff: [2026-03-27T01:15:06Z] [ [2026-03-27T01:15:06Z] - '#f07700: A v', [2026-03-27T01:15:06Z] ? ^^ [2026-03-27T01:15:06Z] + '#f07733: A v', [2026-03-27T01:15:06Z] ? ^^ [2026-03-27T01:15:06Z] - '#f00000: A v', [2026-03-27T01:15:06Z] ? ^^ [2026-03-27T01:15:06Z] + '#f08800: A v', [2026-03-27T01:15:06Z] ? ^^ [2026-03-27T01:15:06Z] ]

PR fix notes

PR #38968: [Transformers v5] Fix tokenizer metadata in quantized LoRA tests

Repository: vllm-project/vllm
Author: SouthWest7
State: closed | merged: False
Link: https://github.com/vllm-project/vllm/pull/38968

Description (problem / solution / changelog)

Purpose

Fix the Transformers v5 regression in tests/lora/test_quant_model.py caused by incorrect tokenizer_config.json metadata in the TinyLlama LoRA test checkpoint.

Test Plan

python -m pytest tests/lora/test_quant_model.py::test_quant_model_lora -v

Test Result

Pass

<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

</details>

Changed files

tests/lora/conftest.py (modified, +20/-2)
tests/lora/test_quant_model.py (modified, +3/-0)

Code Example

$ pytest tests/lora/test_quant_model.py::test_quant_model_lora[model0]
...
AssertionError: assert ['#f07733: A ...#f08800: A v'] == ['#f07700: A ...#f00000: A v']
[2026-03-27T01:15:06Z]
[2026-03-27T01:15:06Z]   At index 0 diff: '#f07733: A v' != '#f07700: A v'
[2026-03-27T01:15:06Z]
[2026-03-27T01:15:06Z]   Full diff:
[2026-03-27T01:15:06Z]     [
[2026-03-27T01:15:06Z]   -     '#f07700: A v',
[2026-03-27T01:15:06Z]   ?           ^^
[2026-03-27T01:15:06Z]   +     '#f07733: A v',
[2026-03-27T01:15:06Z]   ?           ^^
[2026-03-27T01:15:06Z]   -     '#f00000: A v',
[2026-03-27T01:15:06Z]   ?         ^^
[2026-03-27T01:15:06Z]   +     '#f08800: A v',
[2026-03-27T01:15:06Z]   ?         ^^
[2026-03-27T01:15:06Z]     ]

---

# Or your fork
git clone https://github.com/huggingface/transformers.git
git clone https://github.com/vllm-project/vllm.git

cd vllm
VLLM_USE_PRECOMPILED=1 uv pip install -e .
uv pip install -e ../transformers

RAW_BUFFERClick to expand / collapse

This is a sub-issue forming part of the work in https://github.com/vllm-project/vllm/issues/38379, please read the description of this issue before beginning to work on this one.

Which test is failing?

The tokenizer_config.json is incorrect for both the base model and the adapter. If we duplicated these checkpoints and stored them inside https://huggingface.co/vllm-project, then we could own them and update the tokenizer class to be PreTrainedTokenizerFast which will almost always work.

$ pytest tests/lora/test_quant_model.py::test_quant_model_lora[model0]
...
AssertionError: assert ['#f07733: A ...#f08800: A v'] == ['#f07700: A ...#f00000: A v']
[2026-03-27T01:15:06Z]
[2026-03-27T01:15:06Z]   At index 0 diff: '#f07733: A v' != '#f07700: A v'
[2026-03-27T01:15:06Z]
[2026-03-27T01:15:06Z]   Full diff:
[2026-03-27T01:15:06Z]     [
[2026-03-27T01:15:06Z]   -     '#f07700: A v',
[2026-03-27T01:15:06Z]   ?           ^^
[2026-03-27T01:15:06Z]   +     '#f07733: A v',
[2026-03-27T01:15:06Z]   ?           ^^
[2026-03-27T01:15:06Z]   -     '#f00000: A v',
[2026-03-27T01:15:06Z]   ?         ^^
[2026-03-27T01:15:06Z]   +     '#f08800: A v',
[2026-03-27T01:15:06Z]   ?         ^^
[2026-03-27T01:15:06Z]     ]

How to configure my environment?

It's very important that you install both vLLM and Transformers from source so that your test results reflect the current state of both libraries.

# Or your fork
git clone https://github.com/huggingface/transformers.git
git clone https://github.com/vllm-project/vllm.git

cd vllm
VLLM_USE_PRECOMPILED=1 uv pip install -e .
uv pip install -e ../transformers

extent analysis

Fix Plan

To fix the failing test, we need to update the tokenizer_config.json for both the base model and the adapter. We will duplicate the checkpoints and store them inside the https://huggingface.co/vllm-project repository. Then, we will update the tokenizer class to PreTrainedTokenizerFast.

Steps

Duplicate the checkpoints and store them inside https://huggingface.co/vllm-project
Update the tokenizer_config.json to point to the new checkpoints
Update the tokenizer class to PreTrainedTokenizerFast in the test_quant_model.py file

Code Changes

# Update the tokenizer class to PreTrainedTokenizerFast
from transformers import PreTrainedTokenizerFast

# Initialize the tokenizer
tokenizer = PreTrainedTokenizerFast.from_pretrained('vllm-project/model-name')

Replace 'vllm-project/model-name' with the actual name of the model in the https://huggingface.co/vllm-project repository.

Verification

Run the test again using pytest tests/lora/test_quant_model.py::test_quant_model_lora[model0] to verify that the fix worked.

Extra Tips

Make sure to install both vLLM and Transformers from source using the provided instructions to ensure that your test results reflect the current state of both libraries.
If you encounter any issues, check the tokenizer_config.json file to ensure that it is pointing to the correct checkpoints.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#retriever error #indexing error #inference speed #output truncation #response parsing

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - ✅(Solved) Fix [Transformers v5] Base model and LoRA used in test has incorrect `tokenizer_config.json` [1 pull requests, 5 comments, 4 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

PR fix notes

PR #38968: [Transformers v5] Fix tokenizer metadata in quantized LoRA tests

Description (problem / solution / changelog)

Purpose

Test Plan

Test Result

Changed files

Code Example

Which test is failing?

How to configure my environment?

extent analysis

Fix Plan

Steps

Code Changes

Verification

Extra Tips

Still need to ship something?

TRENDING

vllm - ✅(Solved) Fix [Transformers v5] Base model and LoRA used in test has incorrect `tokenizer_config.json` [1 pull requests, 5 comments, 4 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

PR fix notes

PR #38968: [Transformers v5] Fix tokenizer metadata in quantized LoRA tests

Description (problem / solution / changelog)

Purpose

Test Plan

Test Result

Changed files

Code Example

Which test is failing?

How to configure my environment?

extent analysis

Fix Plan

Steps

Code Changes

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING