vllm - ✅(Solved) Fix [Bug]: qwen3.5-27b-gptq deploy fail [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#36585Fetched 2026-04-08 00:36:10
View on GitHub
Comments
1
Participants
2
Timeline
4
Reactions
0
Participants
Timeline (top)
commented ×1cross-referenced ×1labeled ×1subscribed ×1

Error Message

INFO 03-10 11:30:12 [vllm.py:957] Cudagraph is disabled under eager mode <frozen importlib._bootstrap_external>:1296: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. <frozen importlib._bootstrap_external>:1296: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. Traceback (most recent call last): File "/ssd4/workspace/1_test/webserver_new2.py", line 1136, in <module> infer = QwenInfer(args) ^^^^^^^^^^^^^^^ File "/ssd4/workspace/1_test/qwen_infer_vllm.py", line 79, in init self.model = LLMEngine.from_engine_args(model_engine_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/ssd4/.conda/envs/vllm0170_py312/lib/python3.12/site-packages/vllm/v1/engine/llm_engine.py", line 177, in from_engine_args return cls( ^^^^ File "/ssd4/.conda/envs/vllm0170_py312/lib/python3.12/site-packages/vllm/v1/engine/llm_engine.py", line 92, in init self.renderer = renderer = renderer_from_config(self.vllm_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/ssd4/.conda/envs/vllm0170_py312/lib/python3.12/site-packages/vllm/renderers/registry.py", line 89, in renderer_from_config return RENDERER_REGISTRY.load_renderer( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/ssd4/.conda/envs/vllm0170_py312/lib/python3.12/site-packages/vllm/renderers/registry.py", line 63, in load_renderer return renderer_cls.from_config(config, tokenizer_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/ssd4/.conda/envs/vllm0170_py312/lib/python3.12/site-packages/vllm/renderers/hf.py", line 607, in from_config return cls(config, tokenizer) ^^^^^^^^^^^^^^^^^^^^^^ File "/ssd4/.conda/envs/vllm0170_py312/lib/python3.12/site-packages/vllm/renderers/hf.py", line 614, in init super().init(config, tokenizer) File "/ssd4/.conda/envs/vllm0170_py312/lib/python3.12/site-packages/vllm/renderers/base.py", line 93, in init self.mm_processor = mm_registry.create_processor( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/ssd4/.conda/envs/vllm0170_py312/lib/python3.12/site-packages/vllm/multimodal/registry.py", line 214, in create_processor return factories.build_processor(ctx, cache=cache) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/ssd4/.conda/envs/vllm0170_py312/lib/python3.12/site-packages/vllm/multimodal/registry.py", line 95, in build_processor return self.processor(info, dummy_inputs_builder, cache=cache) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/ssd4/.conda/envs/vllm0170_py312/lib/python3.12/site-packages/vllm/multimodal/processing/processor.py", line 997, in init self.data_parser = self.info.get_data_parser() ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/ssd4/.conda/envs/vllm0170_py312/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_vl.py", line 643, in get_data_parser self.get_hf_config().vision_config.spatial_merge_size, ^^^^^^^^^^^^^^^^^^^^ File "/ssd4//.conda/envs/vllm0170_py312/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_5.py", line 114, in get_hf_config return self.ctx.get_hf_config(Qwen3_5Config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/ssd4/.conda/envs/vllm0170_py312/lib/python3.12/site-packages/vllm/multimodal/processing/context.py", line 139, in get_hf_config raise TypeError( TypeError: Invalid type of HuggingFace config. Expected type: <class 'vllm.transformers_utils.configs.qwen3_5.Qwen3_5Config'>, but found type: <class 'transformers.models.qwen3_5.configuration_qwen3_5.Qwen3_5TextConfig'>

Root Cause

however, when I use this configuration to quantize the sft model to int8 using gptq, this issue occurs:

Value error, The checkpoint you are trying to load has model type qwen3_5_text but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.
You can update Transformers with the command pip install --upgrade transformers. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command pip install git+https://github.com/huggingface/transformers.git [type=value_error, input_value=ArgsKwargs((), {'model': ...rocessor_plugin': None}), input_type=ArgsKwargs]

Fix Action

Fixed

PR fix notes

PR #36850: [Bugfix] Support Qwen3.5 text-only configs

Description (problem / solution / changelog)

Fixes #36585

Summary

  • register qwen3_5_text and qwen3_5_moe_text with vLLM's config loader
  • remap text-only Qwen3.5 checkpoints that still advertise conditional-generation architectures onto the text-only CausalLM implementations
  • add regression coverage for Qwen3.5 text-only config parsing and model registry resolution

Testing

  • pytest tests/config/test_model_arch_config.py -k qwen3_5_text_model_arch_config
  • pytest tests/models/test_registry.py -k 'Qwen3_5ForCausalLM or Qwen3_5MoeForCausalLM or test_hf_registry_coverage'

Changed files

  • tests/config/test_model_arch_config.py (modified, +57/-0)
  • tests/models/registry.py (modified, +8/-0)
  • vllm/model_executor/models/config.py (modified, +2/-0)
  • vllm/model_executor/models/registry.py (modified, +2/-0)
  • vllm/transformers_utils/config.py (modified, +2/-0)
  • vllm/transformers_utils/model_arch_config_convertor.py (modified, +28/-0)

Code Example

Value error, The checkpoint you are trying to load has model type qwen3_5_text but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.
You can update Transformers with the command pip install --upgrade transformers. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command pip install git+https://github.com/huggingface/transformers.git [type=value_error, input_value=ArgsKwargs((), {'model': ...rocessor_plugin': None}), input_type=ArgsKwargs]

---

INFO 03-10 11:30:12 [vllm.py:957] Cudagraph is disabled under eager mode
<frozen importlib._bootstrap_external>:1296: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead.
<frozen importlib._bootstrap_external>:1296: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead.
Traceback (most recent call last):
File "/ssd4/workspace/1_test/webserver_new2.py", line 1136, in <module>
infer = QwenInfer(args)
^^^^^^^^^^^^^^^
File "/ssd4/workspace/1_test/qwen_infer_vllm.py", line 79, in init
self.model = LLMEngine.from_engine_args(model_engine_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ssd4/.conda/envs/vllm0170_py312/lib/python3.12/site-packages/vllm/v1/engine/llm_engine.py", line 177, in from_engine_args
return cls(
^^^^
File "/ssd4/.conda/envs/vllm0170_py312/lib/python3.12/site-packages/vllm/v1/engine/llm_engine.py", line 92, in init
self.renderer = renderer = renderer_from_config(self.vllm_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ssd4/.conda/envs/vllm0170_py312/lib/python3.12/site-packages/vllm/renderers/registry.py", line 89, in renderer_from_config
return RENDERER_REGISTRY.load_renderer(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ssd4/.conda/envs/vllm0170_py312/lib/python3.12/site-packages/vllm/renderers/registry.py", line 63, in load_renderer
return renderer_cls.from_config(config, tokenizer_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ssd4/.conda/envs/vllm0170_py312/lib/python3.12/site-packages/vllm/renderers/hf.py", line 607, in from_config
return cls(config, tokenizer)
^^^^^^^^^^^^^^^^^^^^^^
File "/ssd4/.conda/envs/vllm0170_py312/lib/python3.12/site-packages/vllm/renderers/hf.py", line 614, in init
super().init(config, tokenizer)
File "/ssd4/.conda/envs/vllm0170_py312/lib/python3.12/site-packages/vllm/renderers/base.py", line 93, in init
self.mm_processor = mm_registry.create_processor(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ssd4/.conda/envs/vllm0170_py312/lib/python3.12/site-packages/vllm/multimodal/registry.py", line 214, in create_processor
return factories.build_processor(ctx, cache=cache)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ssd4/.conda/envs/vllm0170_py312/lib/python3.12/site-packages/vllm/multimodal/registry.py", line 95, in build_processor
return self.processor(info, dummy_inputs_builder, cache=cache)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ssd4/.conda/envs/vllm0170_py312/lib/python3.12/site-packages/vllm/multimodal/processing/processor.py", line 997, in init
self.data_parser = self.info.get_data_parser()
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ssd4/.conda/envs/vllm0170_py312/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_vl.py", line 643, in get_data_parser
self.get_hf_config().vision_config.spatial_merge_size,
^^^^^^^^^^^^^^^^^^^^
File "/ssd4//.conda/envs/vllm0170_py312/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_5.py", line 114, in get_hf_config
return self.ctx.get_hf_config(Qwen3_5Config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ssd4/.conda/envs/vllm0170_py312/lib/python3.12/site-packages/vllm/multimodal/processing/context.py", line 139, in get_hf_config
raise TypeError(
TypeError: Invalid type of HuggingFace config. Expected type: <class 'vllm.transformers_utils.configs.qwen3_5.Qwen3_5Config'>, but found type: <class 'transformers.models.qwen3_5.configuration_qwen3_5.Qwen3_5TextConfig'>
RAW_BUFFERClick to expand / collapse

Your current environment

torch 2.10.0 torchaudio 2.10.0+cu128 torchvision 0.25.0+cu128 tqdm 4.67.3 transformers 4.57.6 vllm 0.17.0 GPTQModel 5.7.1

🐛 Describe the bug

after installing vllm0170, the transformers 4.57.6 version is automatically installed, and deploying this configuration using the official website qwen3-5-27b succeeds. However, when deploying the model fine-tuned by swift, a ValueError occurs: Tokenizer class TokenizersBackend does not exist or is not currently imported.The solution is also to "retain the model.safetensors.index.json and *.safetensors obtained after training, and use the original files for the rest" like:https://github.com/modelscope/ms-swift/issues/8098 After following this operation, the modified configuration can successfully launch the model after sft with vllm;

however, when I use this configuration to quantize the sft model to int8 using gptq, this issue occurs:

Value error, The checkpoint you are trying to load has model type qwen3_5_text but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.
You can update Transformers with the command pip install --upgrade transformers. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command pip install git+https://github.com/huggingface/transformers.git [type=value_error, input_value=ArgsKwargs((), {'model': ...rocessor_plugin': None}), input_type=ArgsKwargs]

Then, after upgrading Transformers to version 5.3.0, running it will result in an error:

INFO 03-10 11:30:12 [vllm.py:957] Cudagraph is disabled under eager mode
<frozen importlib._bootstrap_external>:1296: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead.
<frozen importlib._bootstrap_external>:1296: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead.
Traceback (most recent call last):
File "/ssd4/workspace/1_test/webserver_new2.py", line 1136, in <module>
infer = QwenInfer(args)
^^^^^^^^^^^^^^^
File "/ssd4/workspace/1_test/qwen_infer_vllm.py", line 79, in init
self.model = LLMEngine.from_engine_args(model_engine_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ssd4/.conda/envs/vllm0170_py312/lib/python3.12/site-packages/vllm/v1/engine/llm_engine.py", line 177, in from_engine_args
return cls(
^^^^
File "/ssd4/.conda/envs/vllm0170_py312/lib/python3.12/site-packages/vllm/v1/engine/llm_engine.py", line 92, in init
self.renderer = renderer = renderer_from_config(self.vllm_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ssd4/.conda/envs/vllm0170_py312/lib/python3.12/site-packages/vllm/renderers/registry.py", line 89, in renderer_from_config
return RENDERER_REGISTRY.load_renderer(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ssd4/.conda/envs/vllm0170_py312/lib/python3.12/site-packages/vllm/renderers/registry.py", line 63, in load_renderer
return renderer_cls.from_config(config, tokenizer_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ssd4/.conda/envs/vllm0170_py312/lib/python3.12/site-packages/vllm/renderers/hf.py", line 607, in from_config
return cls(config, tokenizer)
^^^^^^^^^^^^^^^^^^^^^^
File "/ssd4/.conda/envs/vllm0170_py312/lib/python3.12/site-packages/vllm/renderers/hf.py", line 614, in init
super().init(config, tokenizer)
File "/ssd4/.conda/envs/vllm0170_py312/lib/python3.12/site-packages/vllm/renderers/base.py", line 93, in init
self.mm_processor = mm_registry.create_processor(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ssd4/.conda/envs/vllm0170_py312/lib/python3.12/site-packages/vllm/multimodal/registry.py", line 214, in create_processor
return factories.build_processor(ctx, cache=cache)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ssd4/.conda/envs/vllm0170_py312/lib/python3.12/site-packages/vllm/multimodal/registry.py", line 95, in build_processor
return self.processor(info, dummy_inputs_builder, cache=cache)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ssd4/.conda/envs/vllm0170_py312/lib/python3.12/site-packages/vllm/multimodal/processing/processor.py", line 997, in init
self.data_parser = self.info.get_data_parser()
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ssd4/.conda/envs/vllm0170_py312/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_vl.py", line 643, in get_data_parser
self.get_hf_config().vision_config.spatial_merge_size,
^^^^^^^^^^^^^^^^^^^^
File "/ssd4//.conda/envs/vllm0170_py312/lib/python3.12/site-packages/vllm/model_executor/models/qwen3_5.py", line 114, in get_hf_config
return self.ctx.get_hf_config(Qwen3_5Config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ssd4/.conda/envs/vllm0170_py312/lib/python3.12/site-packages/vllm/multimodal/processing/context.py", line 139, in get_hf_config
raise TypeError(
TypeError: Invalid type of HuggingFace config. Expected type: <class 'vllm.transformers_utils.configs.qwen3_5.Qwen3_5Config'>, but found type: <class 'transformers.models.qwen3_5.configuration_qwen3_5.Qwen3_5TextConfig'>

How should it be solved

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

To solve the issue, we need to ensure compatibility between the vllm library and the transformers library. The error occurs because the transformers library does not recognize the qwen3_5_text architecture.

Here are the steps to fix the issue:

  • Downgrade the transformers library to version 4.57.6 to match the version used in the vllm library.
  • Update the vllm library to use the correct Qwen3_5Config class.

Code Changes

# Downgrade transformers library
pip install transformers==4.57.6

# Update vllm library to use correct Qwen3_5Config class
# In vllm/model_executor/models/qwen3_5.py
from vllm.transformers_utils.configs.qwen3_5 import Qwen3_5Config

class Qwen3_5Model:
    def get_hf_config(self):
        return self.ctx.get_hf_config(Qwen3_5Config)

Verification

After making the changes, verify that the model can be loaded and used without any errors. You can do this by running the following code:

from vllm.model_executor.models.qwen3_5 import Qwen3_5Model

model = Qwen3_5Model()
model.get_hf_config()

If the code runs without any errors, it means that the fix was successful.

Extra Tips

  • Make sure to check the compatibility of the vllm library with the transformers library before making any changes.
  • If you encounter any issues, try downgrading or upgrading the transformers library to a version that is compatible with the vllm library.
  • Always verify that the changes you make do not introduce any new errors or issues.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING