vllm - 💡(How to fix) Fix [Usage]: does vllm support Qwen3_5ForCausalLM architecture inference? not just Qwen3_5ForConditionalGeneration? [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#39993Fetched 2026-04-17 08:27:54
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Participants
Timeline (top)
labeled ×1

Error Message

(APIServer pid=4191148) INFO 04-16 17:17:55 [utils.py:299] (APIServer pid=4191148) INFO 04-16 17:17:55 [utils.py:299] █ █ █▄ ▄█ (APIServer pid=4191148) INFO 04-16 17:17:55 [utils.py:299] ▄▄ ▄█ █ █ █ ▀▄▀ █ version 0.19.0 (APIServer pid=4191148) INFO 04-16 17:17:55 [utils.py:299] █▄█▀ █ █ █ █ model /data/zhangqingguo/wangzejun/Text-Proofreading/two_stage/Qwen3.5-9B-Correction-Stage2-2604/ (APIServer pid=4191148) INFO 04-16 17:17:55 [utils.py:299] ▀▀ ▀▀▀▀▀ ▀▀▀▀▀ ▀ ▀ (APIServer pid=4191148) INFO 04-16 17:17:55 [utils.py:299] (APIServer pid=4191148) INFO 04-16 17:17:55 [utils.py:233] non-default args: {'model_tag': '/data/zhangqingguo/wangzejun/Text-Proofreading/two_stage/Qwen3.5-9B-Correction-Stage2-2604/', 'port': 8100, 'model': '/data/zhangqingguo/wangzejun/Text-Proofreading/two_stage/Qwen3.5-9B-Correction-Stage2-2604/', 'max_model_len': 1024, 'gpu_memory_utilization': 0.5} (APIServer pid=4191148) Unrecognized keys in rope_parameters for 'rope_type'='default': {'mrope_interleaved', 'mrope_section'} (APIServer pid=4191148) Unrecognized keys in rope_parameters for 'rope_type'='default': {'mrope_interleaved', 'mrope_section'} (APIServer pid=4191148) INFO 04-16 17:17:56 [model.py:549] Resolved architecture: Qwen3_5ForCausalLM (APIServer pid=4191148) INFO 04-16 17:17:56 [model.py:1678] Using max model len 1024 (APIServer pid=4191148) INFO 04-16 17:17:57 [config.py:281] Setting attention block size to 272 tokens to ensure that attention page size is >= mamba page size. (APIServer pid=4191148) INFO 04-16 17:17:57 [config.py:312] Padding mamba page size by 1.49% to ensure that mamba page size and attention page size are exactly equal. (APIServer pid=4191148) INFO 04-16 17:17:57 [vllm.py:790] Asynchronous scheduling is enabled. (APIServer pid=4191148) Traceback (most recent call last): (APIServer pid=4191148) File "/data/ENV/condaENVs/zhangqingguo/latest/bin/vllm", line 6, in <module> (APIServer pid=4191148) sys.exit(main()) (APIServer pid=4191148) ^^^^^^ (APIServer pid=4191148) File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/entrypoints/cli/main.py", line 75, in main (APIServer pid=4191148) args.dispatch_function(args) (APIServer pid=4191148) File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/entrypoints/cli/serve.py", line 122, in cmd (APIServer pid=4191148) uvloop.run(run_server(args)) (APIServer pid=4191148) File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/uvloop/init.py", line 92, in run (APIServer pid=4191148) return runner.run(wrapper()) (APIServer pid=4191148) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=4191148) File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/asyncio/runners.py", line 118, in run (APIServer pid=4191148) return self._loop.run_until_complete(task) (APIServer pid=4191148) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=4191148) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete (APIServer pid=4191148) File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/uvloop/init.py", line 48, in wrapper (APIServer pid=4191148) return await main (APIServer pid=4191148) ^^^^^^^^^^ (APIServer pid=4191148) File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 670, in run_server (APIServer pid=4191148) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs) (APIServer pid=4191148) File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 684, in run_server_worker (APIServer pid=4191148) async with build_async_engine_client( (APIServer pid=4191148) File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/contextlib.py", line 210, in aenter (APIServer pid=4191148) return await anext(self.gen) (APIServer pid=4191148) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=4191148) File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 100, in build_async_engine_client (APIServer pid=4191148) async with build_async_engine_client_from_engine_args( (APIServer pid=4191148) File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/contextlib.py", line 210, in aenter (APIServer pid=4191148) return await anext(self.gen) (APIServer pid=4191148) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=4191148) File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 136, in build_async_engine_client_from_engine_args (APIServer pid=4191148) async_llm = AsyncLLM.from_vllm_config( (APIServer pid=4191148) ^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=4191148) File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 225, in from_vllm_config (APIServer pid=4191148) return cls( (APIServer pid=4191148) ^^^^ (APIServer pid=4191148) File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 135, in init (APIServer pid=4191148) self.renderer = renderer = renderer_from_config(self.vllm_config) (APIServer pid=4191148) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=4191148) File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/renderers/registry.py", line 86, in renderer_from_config (APIServer pid=4191148) return RENDERER_REGISTRY.load_renderer(renderer_mode, config, tokenizer) (APIServer pid=4191148) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=4191148) File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/renderers/registry.py", line 68, in load_renderer (APIServer pid=4191148) return renderer_cls(config, tokenizer) (APIServer pid=4191148) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=4191148) File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/renderers/hf.py", line 612, in init (APIServer pid=4191148) super().init(config, tokenizer) (APIServer pid=4191148) File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/renderers/base.py", line 118, in init (APIServer pid=4191148) self.mm_processor = mm_registry.create_processor( (APIServer pid=4191148) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=4191148) File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/multimodal/registry.py", line 214, in create_processor (APIServer pid=4191148) return factories.build_processor(ctx, cache=cache) (APIServer pid=4191148) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=4191148) File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/multimodal/registry.py", line 95, in build_processor (APIServer pid=4191148) return self.processor(info, dummy_inputs_builder, cache=cache) (APIServer pid=4191148) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=4191148) File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/multimodal/processing/processor.py", line 992, in init (APIServer pid=4191148) self.data_parser = self.info.get_data_parser() (APIServer pid=4191148) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=4191148) File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/model_executor/models/qwen3_vl.py", line 706, in get_data_parser (APIServer pid=4191148) self.get_hf_config().vision_config.spatial_merge_size, (APIServer pid=4191148) ^^^^^^^^^^^^^^^^^^^^ (APIServer pid=4191148) File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/model_executor/models/qwen3_5.py", line 110, in get_hf_config (APIServer pid=4191148) return self.ctx.get_hf_config(Qwen3_5Config) (APIServer pid=4191148) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=4191148) File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/multimodal/processing/context.py", line 140, in get_hf_config (APIServer pid=4191148) raise TypeError( (APIServer pid=4191148) TypeError: Invalid type of HuggingFace config. Expected type: <class 'vllm.transformers_utils.configs.qwen3_5.Qwen3_5Config'>, but found type: <class 'transformers.models.qwen3_5.configuration_qwen3_5.Qwen3_5TextConfig'>

Fix Action

Fix / Workaround

(APIServer pid=4191148) INFO 04-16 17:17:55 [utils.py:299] 
(APIServer pid=4191148) INFO 04-16 17:17:55 [utils.py:299]        █     █     █▄   ▄█
(APIServer pid=4191148) INFO 04-16 17:17:55 [utils.py:299]  ▄▄ ▄█ █     █     █ ▀▄▀ █  version 0.19.0
(APIServer pid=4191148) INFO 04-16 17:17:55 [utils.py:299]   █▄█▀ █     █     █     █  model   /data/zhangqingguo/wangzejun/Text-Proofreading/two_stage/Qwen3.5-9B-Correction-Stage2-2604/
(APIServer pid=4191148) INFO 04-16 17:17:55 [utils.py:299]    ▀▀  ▀▀▀▀▀ ▀▀▀▀▀ ▀     ▀
(APIServer pid=4191148) INFO 04-16 17:17:55 [utils.py:299] 
(APIServer pid=4191148) INFO 04-16 17:17:55 [utils.py:233] non-default args: {'model_tag': '/data/zhangqingguo/wangzejun/Text-Proofreading/two_stage/Qwen3.5-9B-Correction-Stage2-2604/', 'port': 8100, 'model': '/data/zhangqingguo/wangzejun/Text-Proofreading/two_stage/Qwen3.5-9B-Correction-Stage2-2604/', 'max_model_len': 1024, 'gpu_memory_utilization': 0.5}
(APIServer pid=4191148) Unrecognized keys in `rope_parameters` for 'rope_type'='default': {'mrope_interleaved', 'mrope_section'}
(APIServer pid=4191148) Unrecognized keys in `rope_parameters` for 'rope_type'='default': {'mrope_interleaved', 'mrope_section'}
(APIServer pid=4191148) INFO 04-16 17:17:56 [model.py:549] Resolved architecture: Qwen3_5ForCausalLM
(APIServer pid=4191148) INFO 04-16 17:17:56 [model.py:1678] Using max model len 1024
(APIServer pid=4191148) INFO 04-16 17:17:57 [config.py:281] Setting attention block size to 272 tokens to ensure that attention page size is >= mamba page size.
(APIServer pid=4191148) INFO 04-16 17:17:57 [config.py:312] Padding mamba page size by 1.49% to ensure that mamba page size and attention page size are exactly equal.
(APIServer pid=4191148) INFO 04-16 17:17:57 [vllm.py:790] Asynchronous scheduling is enabled.
(APIServer pid=4191148) Traceback (most recent call last):
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/bin/vllm", line 6, in <module>
(APIServer pid=4191148)     sys.exit(main())
(APIServer pid=4191148)              ^^^^^^
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/entrypoints/cli/main.py", line 75, in main
(APIServer pid=4191148)     args.dispatch_function(args)
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/entrypoints/cli/serve.py", line 122, in cmd
(APIServer pid=4191148)     uvloop.run(run_server(args))
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/uvloop/__init__.py", line 92, in run
(APIServer pid=4191148)     return runner.run(wrapper())
(APIServer pid=4191148)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/asyncio/runners.py", line 118, in run
(APIServer pid=4191148)     return self._loop.run_until_complete(task)
(APIServer pid=4191148)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=4191148)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=4191148)     return await main
(APIServer pid=4191148)            ^^^^^^^^^^
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 670, in run_server
(APIServer pid=4191148)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 684, in run_server_worker
(APIServer pid=4191148)     async with build_async_engine_client(
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/contextlib.py", line 210, in __aenter__
(APIServer pid=4191148)     return await anext(self.gen)
(APIServer pid=4191148)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 100, in build_async_engine_client
(APIServer pid=4191148)     async with build_async_engine_client_from_engine_args(
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/contextlib.py", line 210, in __aenter__
(APIServer pid=4191148)     return await anext(self.gen)
(APIServer pid=4191148)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 136, in build_async_engine_client_from_engine_args
(APIServer pid=4191148)     async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=4191148)                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 225, in from_vllm_config
(APIServer pid=4191148)     return cls(
(APIServer pid=4191148)            ^^^^
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 135, in __init__
(APIServer pid=4191148)     self.renderer = renderer = renderer_from_config(self.vllm_config)
(APIServer pid=4191148)                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/renderers/registry.py", line 86, in renderer_from_config
(APIServer pid=4191148)     return RENDERER_REGISTRY.load_renderer(renderer_mode, config, tokenizer)
(APIServer pid=4191148)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/renderers/registry.py", line 68, in load_renderer
(APIServer pid=4191148)     return renderer_cls(config, tokenizer)
(APIServer pid=4191148)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/renderers/hf.py", line 612, in __init__
(APIServer pid=4191148)     super().__init__(config, tokenizer)
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/renderers/base.py", line 118, in __init__
(APIServer pid=4191148)     self.mm_processor = mm_registry.create_processor(
(APIServer pid=4191148)                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/multimodal/registry.py", line 214, in create_processor
(APIServer pid=4191148)     return factories.build_processor(ctx, cache=cache)
(APIServer pid=4191148)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/multimodal/registry.py", line 95, in build_processor
(APIServer pid=4191148)     return self.processor(info, dummy_inputs_builder, cache=cache)
(APIServer pid=4191148)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/multimodal/processing/processor.py", line 992, in __init__
(APIServer pid=4191148)     self.data_parser = self.info.get_data_parser()
(APIServer pid=4191148)                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/model_executor/models/qwen3_vl.py", line 706, in get_data_parser
(APIServer pid=4191148)     self.get_hf_config().vision_config.spatial_merge_size,
(APIServer pid=4191148)     ^^^^^^^^^^^^^^^^^^^^
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/model_executor/models/qwen3_5.py", line 110, in get_hf_config
(APIServer pid=4191148)     return self.ctx.get_hf_config(Qwen3_5Config)
(APIServer pid=4191148)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/multimodal/processing/context.py", line 140, in get_hf_config
(APIServer pid=4191148)     raise TypeError(
(APIServer pid=4191148) TypeError: Invalid type of HuggingFace config. Expected type: <class 'vllm.transformers_utils.configs.qwen3_5.Qwen3_5Config'>, but found type: <class 'transformers.models.qwen3_5.configuration_qwen3_5.Qwen3_5TextConfig'>

Code Example

(APIServer pid=4191148) INFO 04-16 17:17:55 [utils.py:299] 
(APIServer pid=4191148) INFO 04-16 17:17:55 [utils.py:299]        █     █     █▄   ▄█
(APIServer pid=4191148) INFO 04-16 17:17:55 [utils.py:299]  ▄▄ ▄█ █     █     █ ▀▄▀ █  version 0.19.0
(APIServer pid=4191148) INFO 04-16 17:17:55 [utils.py:299]   █▄█▀ █     █     █     █  model   /data/zhangqingguo/wangzejun/Text-Proofreading/two_stage/Qwen3.5-9B-Correction-Stage2-2604/
(APIServer pid=4191148) INFO 04-16 17:17:55 [utils.py:299]    ▀▀  ▀▀▀▀▀ ▀▀▀▀▀ ▀     
(APIServer pid=4191148) INFO 04-16 17:17:55 [utils.py:299] 
(APIServer pid=4191148) INFO 04-16 17:17:55 [utils.py:233] non-default args: {'model_tag': '/data/zhangqingguo/wangzejun/Text-Proofreading/two_stage/Qwen3.5-9B-Correction-Stage2-2604/', 'port': 8100, 'model': '/data/zhangqingguo/wangzejun/Text-Proofreading/two_stage/Qwen3.5-9B-Correction-Stage2-2604/', 'max_model_len': 1024, 'gpu_memory_utilization': 0.5}
(APIServer pid=4191148) Unrecognized keys in `rope_parameters` for 'rope_type'='default': {'mrope_interleaved', 'mrope_section'}
(APIServer pid=4191148) Unrecognized keys in `rope_parameters` for 'rope_type'='default': {'mrope_interleaved', 'mrope_section'}
(APIServer pid=4191148) INFO 04-16 17:17:56 [model.py:549] Resolved architecture: Qwen3_5ForCausalLM
(APIServer pid=4191148) INFO 04-16 17:17:56 [model.py:1678] Using max model len 1024
(APIServer pid=4191148) INFO 04-16 17:17:57 [config.py:281] Setting attention block size to 272 tokens to ensure that attention page size is >= mamba page size.
(APIServer pid=4191148) INFO 04-16 17:17:57 [config.py:312] Padding mamba page size by 1.49% to ensure that mamba page size and attention page size are exactly equal.
(APIServer pid=4191148) INFO 04-16 17:17:57 [vllm.py:790] Asynchronous scheduling is enabled.
(APIServer pid=4191148) Traceback (most recent call last):
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/bin/vllm", line 6, in <module>
(APIServer pid=4191148)     sys.exit(main())
(APIServer pid=4191148)              ^^^^^^
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/entrypoints/cli/main.py", line 75, in main
(APIServer pid=4191148)     args.dispatch_function(args)
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/entrypoints/cli/serve.py", line 122, in cmd
(APIServer pid=4191148)     uvloop.run(run_server(args))
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/uvloop/__init__.py", line 92, in run
(APIServer pid=4191148)     return runner.run(wrapper())
(APIServer pid=4191148)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/asyncio/runners.py", line 118, in run
(APIServer pid=4191148)     return self._loop.run_until_complete(task)
(APIServer pid=4191148)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=4191148)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=4191148)     return await main
(APIServer pid=4191148)            ^^^^^^^^^^
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 670, in run_server
(APIServer pid=4191148)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 684, in run_server_worker
(APIServer pid=4191148)     async with build_async_engine_client(
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/contextlib.py", line 210, in __aenter__
(APIServer pid=4191148)     return await anext(self.gen)
(APIServer pid=4191148)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 100, in build_async_engine_client
(APIServer pid=4191148)     async with build_async_engine_client_from_engine_args(
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/contextlib.py", line 210, in __aenter__
(APIServer pid=4191148)     return await anext(self.gen)
(APIServer pid=4191148)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 136, in build_async_engine_client_from_engine_args
(APIServer pid=4191148)     async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=4191148)                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 225, in from_vllm_config
(APIServer pid=4191148)     return cls(
(APIServer pid=4191148)            ^^^^
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 135, in __init__
(APIServer pid=4191148)     self.renderer = renderer = renderer_from_config(self.vllm_config)
(APIServer pid=4191148)                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/renderers/registry.py", line 86, in renderer_from_config
(APIServer pid=4191148)     return RENDERER_REGISTRY.load_renderer(renderer_mode, config, tokenizer)
(APIServer pid=4191148)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/renderers/registry.py", line 68, in load_renderer
(APIServer pid=4191148)     return renderer_cls(config, tokenizer)
(APIServer pid=4191148)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/renderers/hf.py", line 612, in __init__
(APIServer pid=4191148)     super().__init__(config, tokenizer)
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/renderers/base.py", line 118, in __init__
(APIServer pid=4191148)     self.mm_processor = mm_registry.create_processor(
(APIServer pid=4191148)                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/multimodal/registry.py", line 214, in create_processor
(APIServer pid=4191148)     return factories.build_processor(ctx, cache=cache)
(APIServer pid=4191148)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/multimodal/registry.py", line 95, in build_processor
(APIServer pid=4191148)     return self.processor(info, dummy_inputs_builder, cache=cache)
(APIServer pid=4191148)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/multimodal/processing/processor.py", line 992, in __init__
(APIServer pid=4191148)     self.data_parser = self.info.get_data_parser()
(APIServer pid=4191148)                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/model_executor/models/qwen3_vl.py", line 706, in get_data_parser
(APIServer pid=4191148)     self.get_hf_config().vision_config.spatial_merge_size,
(APIServer pid=4191148)     ^^^^^^^^^^^^^^^^^^^^
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/model_executor/models/qwen3_5.py", line 110, in get_hf_config
(APIServer pid=4191148)     return self.ctx.get_hf_config(Qwen3_5Config)
(APIServer pid=4191148)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/multimodal/processing/context.py", line 140, in get_hf_config
(APIServer pid=4191148)     raise TypeError(
(APIServer pid=4191148) TypeError: Invalid type of HuggingFace config. Expected type: <class 'vllm.transformers_utils.configs.qwen3_5.Qwen3_5Config'>, but found type: <class 'transformers.models.qwen3_5.configuration_qwen3_5.Qwen3_5TextConfig'>
RAW_BUFFERClick to expand / collapse

Your current environment

  • vllm 0.19.0
  • transformers 5.2.0

I finetuned a language model based on Qwen3.5-9B, using the Qwen3_5ForCausalLM architecture.

vllm serve Qwen3.5-9B-Stage2-2604 --port 8100 --tensor-parallel-size 1 --max-model-len 1024 --gpu-memory-utilization 0.6

(APIServer pid=4191148) INFO 04-16 17:17:55 [utils.py:299] 
(APIServer pid=4191148) INFO 04-16 17:17:55 [utils.py:299]        █     █     █▄   ▄█
(APIServer pid=4191148) INFO 04-16 17:17:55 [utils.py:299]  ▄▄ ▄█ █     █     █ ▀▄▀ █  version 0.19.0
(APIServer pid=4191148) INFO 04-16 17:17:55 [utils.py:299]   █▄█▀ █     █     █     █  model   /data/zhangqingguo/wangzejun/Text-Proofreading/two_stage/Qwen3.5-9B-Correction-Stage2-2604/
(APIServer pid=4191148) INFO 04-16 17:17:55 [utils.py:299]    ▀▀  ▀▀▀▀▀ ▀▀▀▀▀ ▀     ▀
(APIServer pid=4191148) INFO 04-16 17:17:55 [utils.py:299] 
(APIServer pid=4191148) INFO 04-16 17:17:55 [utils.py:233] non-default args: {'model_tag': '/data/zhangqingguo/wangzejun/Text-Proofreading/two_stage/Qwen3.5-9B-Correction-Stage2-2604/', 'port': 8100, 'model': '/data/zhangqingguo/wangzejun/Text-Proofreading/two_stage/Qwen3.5-9B-Correction-Stage2-2604/', 'max_model_len': 1024, 'gpu_memory_utilization': 0.5}
(APIServer pid=4191148) Unrecognized keys in `rope_parameters` for 'rope_type'='default': {'mrope_interleaved', 'mrope_section'}
(APIServer pid=4191148) Unrecognized keys in `rope_parameters` for 'rope_type'='default': {'mrope_interleaved', 'mrope_section'}
(APIServer pid=4191148) INFO 04-16 17:17:56 [model.py:549] Resolved architecture: Qwen3_5ForCausalLM
(APIServer pid=4191148) INFO 04-16 17:17:56 [model.py:1678] Using max model len 1024
(APIServer pid=4191148) INFO 04-16 17:17:57 [config.py:281] Setting attention block size to 272 tokens to ensure that attention page size is >= mamba page size.
(APIServer pid=4191148) INFO 04-16 17:17:57 [config.py:312] Padding mamba page size by 1.49% to ensure that mamba page size and attention page size are exactly equal.
(APIServer pid=4191148) INFO 04-16 17:17:57 [vllm.py:790] Asynchronous scheduling is enabled.
(APIServer pid=4191148) Traceback (most recent call last):
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/bin/vllm", line 6, in <module>
(APIServer pid=4191148)     sys.exit(main())
(APIServer pid=4191148)              ^^^^^^
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/entrypoints/cli/main.py", line 75, in main
(APIServer pid=4191148)     args.dispatch_function(args)
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/entrypoints/cli/serve.py", line 122, in cmd
(APIServer pid=4191148)     uvloop.run(run_server(args))
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/uvloop/__init__.py", line 92, in run
(APIServer pid=4191148)     return runner.run(wrapper())
(APIServer pid=4191148)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/asyncio/runners.py", line 118, in run
(APIServer pid=4191148)     return self._loop.run_until_complete(task)
(APIServer pid=4191148)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=4191148)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=4191148)     return await main
(APIServer pid=4191148)            ^^^^^^^^^^
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 670, in run_server
(APIServer pid=4191148)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 684, in run_server_worker
(APIServer pid=4191148)     async with build_async_engine_client(
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/contextlib.py", line 210, in __aenter__
(APIServer pid=4191148)     return await anext(self.gen)
(APIServer pid=4191148)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 100, in build_async_engine_client
(APIServer pid=4191148)     async with build_async_engine_client_from_engine_args(
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/contextlib.py", line 210, in __aenter__
(APIServer pid=4191148)     return await anext(self.gen)
(APIServer pid=4191148)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 136, in build_async_engine_client_from_engine_args
(APIServer pid=4191148)     async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=4191148)                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 225, in from_vllm_config
(APIServer pid=4191148)     return cls(
(APIServer pid=4191148)            ^^^^
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 135, in __init__
(APIServer pid=4191148)     self.renderer = renderer = renderer_from_config(self.vllm_config)
(APIServer pid=4191148)                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/renderers/registry.py", line 86, in renderer_from_config
(APIServer pid=4191148)     return RENDERER_REGISTRY.load_renderer(renderer_mode, config, tokenizer)
(APIServer pid=4191148)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/renderers/registry.py", line 68, in load_renderer
(APIServer pid=4191148)     return renderer_cls(config, tokenizer)
(APIServer pid=4191148)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/renderers/hf.py", line 612, in __init__
(APIServer pid=4191148)     super().__init__(config, tokenizer)
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/renderers/base.py", line 118, in __init__
(APIServer pid=4191148)     self.mm_processor = mm_registry.create_processor(
(APIServer pid=4191148)                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/multimodal/registry.py", line 214, in create_processor
(APIServer pid=4191148)     return factories.build_processor(ctx, cache=cache)
(APIServer pid=4191148)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/multimodal/registry.py", line 95, in build_processor
(APIServer pid=4191148)     return self.processor(info, dummy_inputs_builder, cache=cache)
(APIServer pid=4191148)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/multimodal/processing/processor.py", line 992, in __init__
(APIServer pid=4191148)     self.data_parser = self.info.get_data_parser()
(APIServer pid=4191148)                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/model_executor/models/qwen3_vl.py", line 706, in get_data_parser
(APIServer pid=4191148)     self.get_hf_config().vision_config.spatial_merge_size,
(APIServer pid=4191148)     ^^^^^^^^^^^^^^^^^^^^
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/model_executor/models/qwen3_5.py", line 110, in get_hf_config
(APIServer pid=4191148)     return self.ctx.get_hf_config(Qwen3_5Config)
(APIServer pid=4191148)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=4191148)   File "/data/ENV/condaENVs/zhangqingguo/latest/lib/python3.11/site-packages/vllm/multimodal/processing/context.py", line 140, in get_hf_config
(APIServer pid=4191148)     raise TypeError(
(APIServer pid=4191148) TypeError: Invalid type of HuggingFace config. Expected type: <class 'vllm.transformers_utils.configs.qwen3_5.Qwen3_5Config'>, but found type: <class 'transformers.models.qwen3_5.configuration_qwen3_5.Qwen3_5TextConfig'>

How would you like to use vllm

I want to run inference of Qwen3_5ForCausalLM architecture language model. I don't know how to integrate it with vllm.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

The issue is likely due to a mismatch between the expected and actual HuggingFace config types, and a potential solution is to ensure the correct config type is used for the Qwen3_5ForCausalLM architecture.

Guidance

  • Verify that the Qwen3_5Config class is being used correctly and that it matches the expected type.
  • Check the transformers version (5.2.0) to ensure it is compatible with the vllm version (0.19.0).
  • Review the vllm documentation and code to understand how to properly integrate the Qwen3_5ForCausalLM architecture with vllm.
  • Consider checking the Qwen3_5TextConfig class to see if it can be used as a substitute or if there's a way to convert it to the expected Qwen3_5Config type.

Example

No code example is provided as the issue seems to be related to a specific configuration or version mismatch.

Notes

The error message indicates a TypeError due to an invalid type of HuggingFace config. This suggests that there might be a version or configuration issue between vllm and transformers. Without more information about the specific configurations or code, it's difficult to provide a more detailed solution.

Recommendation

Apply a workaround by verifying the compatibility of vllm and transformers versions and ensuring the correct config type is used for the Qwen3_5ForCausalLM architecture. If the issue persists, consider seeking further assistance from the vllm community or documentation.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - 💡(How to fix) Fix [Usage]: does vllm support Qwen3_5ForCausalLM architecture inference? not just Qwen3_5ForConditionalGeneration? [1 participants]