vllm - 💡(How to fix) Fix [Bug]: vLLM inference produces garbled output on long context for SFT'd Qwen3-14B (short context is normal) [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#39576Fetched 2026-04-12 13:24:41
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Participants
Timeline (top)
labeled ×1

Error Message

2.2 The following error occurs when loading the exported model in vLLM for the first time: Upon starting the merged model on vLLM 0.18.0, an error is raised indicating that a specific parameter in tokenizer_config.json is expected to be a dict {}, but a list [] was provided:

Code Example

Your output of `python collect_env.py` here
RAW_BUFFERClick to expand / collapse

Your current environment

<details> <summary>The output of <code>python collect_env.py</code></summary>
Your output of `python collect_env.py` here
</details>

🐛 Describe the bug

  1. System Info 1.1 training env
  • llamafactory-cli version 0.9.5.dev0
  • transformers Version: 5.2.0
  • Python 3.12.12
  • torch 2.8.0+cu128
  • torchaudio 2.8.0+cu128
  • torchdata 0.11.0
  • torchvision 0.23.0+cu128 1.2 Inference env
  • vllm docker:vllm/vllm-openai:v0.18.0
  • vllm docker transformers Version: 4.57.6

2、Reproduce step 2.1 After SFT of the Qwen3-14B model, the parameters for merging and exporting are configured as follows: model_name_or_path: ~/work/basemodel/Qwen3-14B adapter_name_or_path: saves/qwen3-14b-g3.0/train template: qwen3 trust_remote_code: true export export_dir: saves/qwen3-14b-g3.0/export export_size: 5 export_device: auto export_legacy_format: false

2.2 The following error occurs when loading the exported model in vLLM for the first time: Upon starting the merged model on vLLM 0.18.0, an error is raised indicating that a specific parameter in tokenizer_config.json is expected to be a dict {}, but a list [] was provided: "extra_special_tokens": [ "<|im_start|>", "<|im_end|>", "<|object_ref_start|>", "<|object_ref_end|>", "<|box_start|>", "<|box_end|>", "<|quad_start|>", "<|quad_end|>", "<|vision_start|>", "<|vision_end|>", "<|vision_pad|>", "<|image_pad|>", "<|video_pad|>" ],

2.3 I referenced the original Qwen3-14B tokenizer_config.json and changed extra_special_tokens to additional_special_tokens. After this change, the model runs successfully in vLLM.

2.4 After modifying tokenizer_config.json, I ran inference with vLLM (using parameter: --max-model-len 30720). The model returns garbled text for long inputs, but works correctly for short inputs.

2.5 Inference with transformers 4.57.1 on the same long text also returns garbled text.

2.6 However, inference with transformers 5.2.0 on the same long text works correctly.

Question: How can I resolve the garbled text issue for long context?"

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

The issue with garbled text for long inputs in vLLM can likely be resolved by ensuring compatibility between the model and the transformers version used, specifically considering the differences in handling special tokens and input lengths.

Guidance

  • Verify that the tokenizer_config.json modifications, such as changing extra_special_tokens to additional_special_tokens, are correctly applied and consistent across all environments.
  • Check the input length limits and special token handling in both vLLM and transformers versions (4.57.6 and 5.2.0) to ensure compatibility with the model's expectations.
  • Test the model with different input lengths and special tokens using both vLLM and transformers to isolate the cause of the garbled text.
  • Consider updating the vLLM environment to use a transformers version that is known to work correctly with the model, such as version 5.2.0, if possible.

Example

No specific code snippet is provided due to the lack of direct code references in the issue, but ensuring the correct configuration of tokenizer_config.json and testing with different input lengths and special tokens can help diagnose the issue.

Notes

The issue seems to be related to version compatibility and special token handling. The fact that the model works correctly with transformers 5.2.0 but not with 4.57.6 or vLLM's version suggests a potential version-specific bug or incompatibility.

Recommendation

Apply a workaround by using a compatible transformers version, such as 5.2.0, for inference tasks, especially when dealing with long inputs, until the compatibility issue with vLLM and older transformers versions is resolved.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING