vllm - ✅(Solved) Fix [Transformers v5] Base model and LoRA used in test has incorrect `tokenizer_config.json` [1 pull requests, 5 comments, 4 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#38386Fetched 2026-04-08 01:41:39
View on GitHub
Comments
5
Participants
4
Timeline
20
Reactions
0
Author
Assignees
Timeline (top)
commented ×5mentioned ×5subscribed ×5labeled ×2

Error Message

$ pytest tests/lora/test_quant_model.py::test_quant_model_lora[model0] ... AssertionError: assert ['#f07733: A ...#f08800: A v'] == ['#f07700: A ...#f00000: A v'] [2026-03-27T01:15:06Z] [2026-03-27T01:15:06Z] At index 0 diff: '#f07733: A v' != '#f07700: A v' [2026-03-27T01:15:06Z] [2026-03-27T01:15:06Z] Full diff: [2026-03-27T01:15:06Z] [ [2026-03-27T01:15:06Z] - '#f07700: A v', [2026-03-27T01:15:06Z] ? ^^ [2026-03-27T01:15:06Z] + '#f07733: A v', [2026-03-27T01:15:06Z] ? ^^ [2026-03-27T01:15:06Z] - '#f00000: A v', [2026-03-27T01:15:06Z] ? ^^ [2026-03-27T01:15:06Z] + '#f08800: A v', [2026-03-27T01:15:06Z] ? ^^ [2026-03-27T01:15:06Z] ]

PR fix notes

PR #38968: [Transformers v5] Fix tokenizer metadata in quantized LoRA tests

Description (problem / solution / changelog)

Purpose

Fix the Transformers v5 regression in tests/lora/test_quant_model.py caused by incorrect tokenizer_config.json metadata in the TinyLlama LoRA test checkpoint.

Related Issue: https://github.com/vllm-project/vllm/issues/38386

Test Plan

python -m pytest tests/lora/test_quant_model.py::test_quant_model_lora -v

Test Result

Pass


<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.
</details>

Changed files

  • tests/lora/conftest.py (modified, +20/-2)
  • tests/lora/test_quant_model.py (modified, +3/-0)

Code Example

$ pytest tests/lora/test_quant_model.py::test_quant_model_lora[model0]
...
AssertionError: assert ['#f07733: A ...#f08800: A v'] == ['#f07700: A ...#f00000: A v']
[2026-03-27T01:15:06Z]
[2026-03-27T01:15:06Z]   At index 0 diff: '#f07733: A v' != '#f07700: A v'
[2026-03-27T01:15:06Z]
[2026-03-27T01:15:06Z]   Full diff:
[2026-03-27T01:15:06Z]     [
[2026-03-27T01:15:06Z]   -     '#f07700: A v',
[2026-03-27T01:15:06Z]   ?           ^^
[2026-03-27T01:15:06Z]   +     '#f07733: A v',
[2026-03-27T01:15:06Z]   ?           ^^
[2026-03-27T01:15:06Z]   -     '#f00000: A v',
[2026-03-27T01:15:06Z]   ?         ^^
[2026-03-27T01:15:06Z]   +     '#f08800: A v',
[2026-03-27T01:15:06Z]   ?         ^^
[2026-03-27T01:15:06Z]     ]

---

# Or your fork
git clone https://github.com/huggingface/transformers.git
git clone https://github.com/vllm-project/vllm.git

cd vllm
VLLM_USE_PRECOMPILED=1 uv pip install -e .
uv pip install -e ../transformers
RAW_BUFFERClick to expand / collapse

This is a sub-issue forming part of the work in https://github.com/vllm-project/vllm/issues/38379, please read the description of this issue before beginning to work on this one.

Which test is failing?

The tokenizer_config.json is incorrect for both the base model and the adapter. If we duplicated these checkpoints and stored them inside https://huggingface.co/vllm-project, then we could own them and update the tokenizer class to be PreTrainedTokenizerFast which will almost always work.

$ pytest tests/lora/test_quant_model.py::test_quant_model_lora[model0]
...
AssertionError: assert ['#f07733: A ...#f08800: A v'] == ['#f07700: A ...#f00000: A v']
[2026-03-27T01:15:06Z]
[2026-03-27T01:15:06Z]   At index 0 diff: '#f07733: A v' != '#f07700: A v'
[2026-03-27T01:15:06Z]
[2026-03-27T01:15:06Z]   Full diff:
[2026-03-27T01:15:06Z]     [
[2026-03-27T01:15:06Z]   -     '#f07700: A v',
[2026-03-27T01:15:06Z]   ?           ^^
[2026-03-27T01:15:06Z]   +     '#f07733: A v',
[2026-03-27T01:15:06Z]   ?           ^^
[2026-03-27T01:15:06Z]   -     '#f00000: A v',
[2026-03-27T01:15:06Z]   ?         ^^
[2026-03-27T01:15:06Z]   +     '#f08800: A v',
[2026-03-27T01:15:06Z]   ?         ^^
[2026-03-27T01:15:06Z]     ]

How to configure my environment?

It's very important that you install both vLLM and Transformers from source so that your test results reflect the current state of both libraries.

# Or your fork
git clone https://github.com/huggingface/transformers.git
git clone https://github.com/vllm-project/vllm.git

cd vllm
VLLM_USE_PRECOMPILED=1 uv pip install -e .
uv pip install -e ../transformers

extent analysis

Fix Plan

To fix the failing test, we need to update the tokenizer_config.json for both the base model and the adapter. We will duplicate the checkpoints and store them inside the https://huggingface.co/vllm-project repository. Then, we will update the tokenizer class to PreTrainedTokenizerFast.

Steps

  • Duplicate the checkpoints and store them inside https://huggingface.co/vllm-project
  • Update the tokenizer_config.json to point to the new checkpoints
  • Update the tokenizer class to PreTrainedTokenizerFast in the test_quant_model.py file

Code Changes

# Update the tokenizer class to PreTrainedTokenizerFast
from transformers import PreTrainedTokenizerFast

# Initialize the tokenizer
tokenizer = PreTrainedTokenizerFast.from_pretrained('vllm-project/model-name')

Replace 'vllm-project/model-name' with the actual name of the model in the https://huggingface.co/vllm-project repository.

Verification

Run the test again using pytest tests/lora/test_quant_model.py::test_quant_model_lora[model0] to verify that the fix worked.

Extra Tips

  • Make sure to install both vLLM and Transformers from source using the provided instructions to ensure that your test results reflect the current state of both libraries.
  • If you encounter any issues, check the tokenizer_config.json file to ensure that it is pointing to the correct checkpoints.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING