vllm - 💡(How to fix) Fix [Bug]: Qwen3-VL-MoE crashes at init with pipeline parallelism ("No model architectures are specified")

vllm2026-05-21 02:57:04

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Error Message

File ".../vllm/model_executor/models/qwen3_vl_moe.py", line 450, in init vllm_config=vllm_config.with_hf_config(config.text_config), File ".../vllm/config/vllm.py", line 593, in with_hf_config return replace(self, model_config=model_config) File ".../vllm/config/utils.py", line 127, in replace return cls(**dataclass_dict) pydantic_core._pydantic_core.ValidationError: 1 validation error for VllmConfig Value error, No model architectures are specified

Root Cause

Qwen3VLMoeForConditionalGeneration.__init__ builds the inner language model from the text sub-config:

self.language_model = Qwen3MoeLLMForCausalLM(
    vllm_config=vllm_config.with_hf_config(config.text_config),
    prefix=maybe_prefix(prefix, "language_model"),
)

For these checkpoints config.text_config.architectures is None (architectures only exist on the top-level config). with_hf_config() does not infer them, so the language model's ModelConfig ends up with an empty architectures list and registry.inspect_model_cls raises "No model architectures are specified".

The sibling qwen3_omni_moe_thinker.py, which constructs the same Qwen3MoeLLMForCausalLM, already passes architectures=["Qwen3MoeForCausalLM"] to with_hf_config. qwen3_vl_moe.py (and the dense qwen3_vl.py, which constructs Qwen3LLMForCausalLM) just omit it.

Fix Action

Fix

Pass the text-model architecture explicitly to with_hf_config, matching the existing convention in qwen3_omni_moe_thinker.py / qwen3_asr.py. PR incoming.

Code Example

File ".../vllm/model_executor/models/qwen3_vl_moe.py", line 450, in __init__
    vllm_config=vllm_config.with_hf_config(config.text_config),
File ".../vllm/config/vllm.py", line 593, in with_hf_config
    return replace(self, model_config=model_config)
File ".../vllm/config/utils.py", line 127, in replace
    return cls(**dataclass_dict)
pydantic_core._pydantic_core.ValidationError: 1 validation error for VllmConfig
  Value error, No model architectures are specified

---

self.language_model = Qwen3MoeLLMForCausalLM(
    vllm_config=vllm_config.with_hf_config(config.text_config),
    prefix=maybe_prefix(prefix, "language_model"),
)

RAW_BUFFERClick to expand / collapse

Your current environment

<details> <summary>Environment</summary>

vLLM main (commit 9640970); also reproduced on nvcr.io/nvidia/vllm:26.04-py3 (vLLM 0.19.0).
2 nodes × 1 GPU (2× NVIDIA DGX Spark, GB10), RoCE interconnect.
Launch: --tensor-parallel-size 1 --pipeline-parallel-size 2 --nnodes 2, mp (multiproc) distributed executor.
Model: Qwen/Qwen3-VL-235B-A22B-Thinking (AWQ). Affects any Qwen3-VL-MoE checkpoint.

</details>

🐛 Describe the bug

Serving a Qwen3-VL-MoE model with --pipeline-parallel-size 2 fails during engine initialization with a pydantic ValidationError:

File ".../vllm/model_executor/models/qwen3_vl_moe.py", line 450, in __init__
    vllm_config=vllm_config.with_hf_config(config.text_config),
File ".../vllm/config/vllm.py", line 593, in with_hf_config
    return replace(self, model_config=model_config)
File ".../vllm/config/utils.py", line 127, in replace
    return cls(**dataclass_dict)
pydantic_core._pydantic_core.ValidationError: 1 validation error for VllmConfig
  Value error, No model architectures are specified

Reproduces with pipeline_parallel_size > 1; does not occur with --tensor-parallel-size 2 --pipeline-parallel-size 1.

Root cause

Qwen3VLMoeForConditionalGeneration.__init__ builds the inner language model from the text sub-config:

self.language_model = Qwen3MoeLLMForCausalLM(
    vllm_config=vllm_config.with_hf_config(config.text_config),
    prefix=maybe_prefix(prefix, "language_model"),
)

Fix

Pass the text-model architecture explicitly to with_hf_config, matching the existing convention in qwen3_omni_moe_thinker.py / qwen3_asr.py. PR incoming.

Before submitting a new issue...

Made sure I searched the documentation and existing issues.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering