vllm - 💡(How to fix) Fix [Bug]: vLLM inference produces garbled output on long context for SFT'd Qwen3-14B (short context is normal) [1 participants]

vllm2026-04-11 15:25:59

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#39576•Fetched 2026-04-12 13:24:41

View on GitHub

Comments

Participants

Timeline

Reactions

Author

paulplay-pm

Participants

paulplay-pm

Timeline (top)

labeled ×1

Error Message

Code Example

Your output of `python collect_env.py` here

RAW_BUFFERClick to expand / collapse

Your current environment

<details> <summary>The output of <code>python collect_env.py</code></summary>

Your output of `python collect_env.py` here

</details>

🐛 Describe the bug

System Info 1.1 training env

llamafactory-cli version 0.9.5.dev0
transformers Version: 5.2.0
Python 3.12.12
torch 2.8.0+cu128
torchaudio 2.8.0+cu128
torchdata 0.11.0
torchvision 0.23.0+cu128 1.2 Inference env
vllm docker：vllm/vllm-openai:v0.18.0
vllm docker transformers Version: 4.57.6

2、Reproduce step 2.1 After SFT of the Qwen3-14B model, the parameters for merging and exporting are configured as follows: model_name_or_path: ~/work/basemodel/Qwen3-14B adapter_name_or_path: saves/qwen3-14b-g3.0/train template: qwen3 trust_remote_code: true export export_dir: saves/qwen3-14b-g3.0/export export_size: 5 export_device: auto export_legacy_format: false

2.2 The following error occurs when loading the exported model in vLLM for the first time: Upon starting the merged model on vLLM 0.18.0, an error is raised indicating that a specific parameter in tokenizer_config.json is expected to be a dict {}, but a list [] was provided: "extra_special_tokens": [ "<|im_start|>", "<|im_end|>", "<|object_ref_start|>", "<|object_ref_end|>", "<|box_start|>", "<|box_end|>", "<|quad_start|>", "<|quad_end|>", "<|vision_start|>", "<|vision_end|>", "<|vision_pad|>", "<|image_pad|>", "<|video_pad|>" ],

2.3 I referenced the original Qwen3-14B tokenizer_config.json and changed extra_special_tokens to additional_special_tokens. After this change, the model runs successfully in vLLM.

2.4 After modifying tokenizer_config.json, I ran inference with vLLM (using parameter: --max-model-len 30720). The model returns garbled text for long inputs, but works correctly for short inputs.

2.5 Inference with transformers 4.57.1 on the same long text also returns garbled text.

2.6 However, inference with transformers 5.2.0 on the same long text works correctly.

Question: How can I resolve the garbled text issue for long context?"

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

The issue with garbled text for long inputs in vLLM can likely be resolved by ensuring compatibility between the model and the transformers version used, specifically considering the differences in handling special tokens and input lengths.

Guidance

Verify that the tokenizer_config.json modifications, such as changing extra_special_tokens to additional_special_tokens, are correctly applied and consistent across all environments.
Check the input length limits and special token handling in both vLLM and transformers versions (4.57.6 and 5.2.0) to ensure compatibility with the model's expectations.
Test the model with different input lengths and special tokens using both vLLM and transformers to isolate the cause of the garbled text.
Consider updating the vLLM environment to use a transformers version that is known to work correctly with the model, such as version 5.2.0, if possible.

Example

No specific code snippet is provided due to the lack of direct code references in the issue, but ensuring the correct configuration of tokenizer_config.json and testing with different input lengths and special tokens can help diagnose the issue.

Notes

The issue seems to be related to version compatibility and special token handling. The fact that the model works correctly with transformers 5.2.0 but not with 4.57.6 or vLLM's version suggests a potential version-specific bug or incompatibility.

Recommendation

Apply a workaround by using a compatible transformers version, such as 5.2.0, for inference tasks, especially when dealing with long inputs, until the compatibility issue with vLLM and older transformers versions is resolved.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#permission error #memory optimization #batch processing #GPU compatibility #latency issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - 💡(How to fix) Fix [Bug]: vLLM inference produces garbled output on long context for SFT'd Qwen3-14B (short context is normal) [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

Your current environment

🐛 Describe the bug

Before submitting a new issue...

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

vllm - 💡(How to fix) Fix [Bug]: vLLM inference produces garbled output on long context for SFT'd Qwen3-14B (short context is normal) [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

Your current environment

🐛 Describe the bug

Before submitting a new issue...

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING