vllm - ✅(Solved) Fix [Bug]: Qwen3.5 trust_remote_code loading intermittently fails with KeyError 'qwen3_5_moe' in worker init [1 pull requests, 1 participants]

vllm2026-04-18 15:45:46

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#40249•Fetched 2026-04-19 15:04:46

View on GitHub

Comments

Participants

Timeline

Reactions

Author

arpera

Participants

arpera

Timeline (top)

labeled ×1

Intermittent KeyError: 'qwen3_5_moe' in transformers.CONFIG_MAPPING at vLLM worker startup when loading Qwen/Qwen3.5-397B-A17B with --trust-remote-code, even though the same sweep's other concurrency levels of the same model load successfully.

Error Message

The failing worker subprocess exits during init while resolving the tokenizer. The underlying cause is a KeyError: 'qwen3_5_moe' from transformers.CONFIG_MAPPING:

Root Cause

PR fix notes

PR #40299: Register parsed config classes before tokenizer init

Repository: vllm-project/vllm
Author: Bortlesboat
State: open | merged: False
Link: https://github.com/vllm-project/vllm/pull/40299

Description (problem / solution / changelog)

Summary

Register the already-parsed HF config class with AutoConfig before cached_tokenizer_from_config() reads the tokenizer config again.

This fixes #40249, where worker-side tokenizer init could still fail on qwen3_5_moe even though vLLM had already parsed the model config successfully.

Why this is not a duplicate

Checked before posting:

gh issue view 40249 --repo vllm-project/vllm --comments
gh pr list --repo vllm-project/vllm --state open --search "40249 in:body"
gh pr list --repo vllm-project/vllm --state open --search "qwen3_5_moe tokenizer trust_remote_code worker init"

Those checks did not find an open PR for this lane. This is also different from #39554: that change keeps HFConfigParser.parse() on the right config class, while this patch fixes the later tokenizer init path inside workers.

Tests

uvx ruff check vllm/transformers_utils/config.py vllm/tokenizers/registry.py tests/tokenizers_/test_registry.py -> passed
uv run --isolated python -m py_compile vllm/transformers_utils/config.py vllm/tokenizers/registry.py tests/tokenizers_/test_registry.py -> passed
uv run --isolated --with pytest python - with direct calls to test_customized_tokenizer() and test_cached_tokenizer_from_config_registers_local_config(...) -> PASS

AI assistance

I used AI assistance for issue triage and an initial patch/test draft, then reviewed the final diff and verified it locally.

Changed files

tests/tokenizers_/test_registry.py (modified, +64/-0)
vllm/tokenizers/registry.py (modified, +3/-1)
vllm/transformers_utils/config.py (modified, +17/-2)

Code Example

vLLM version:           0.19.1rc1.dev216+g17e787a77.cu130
Python:                 3.12
CUDA (vLLM build):      13.0
GPU:                    8xNVIDIA GB300
CPU:                    aarch64

Model:                  Qwen/Qwen3.5-397B-A17B
trust_remote_code:      True
dtype:                  bfloat16
max_model_len:          2176
tensor_parallel_size:   4
pipeline_parallel_size: 1
data_parallel_size:     1
enable_expert_parallel: True
language_model_only:    True
attention_backend:      FLASHINFER
quantization:           None (bf16)

---

VLLM_USE_FLASHINFER_MOE_FP16=1 \
python3 -m vllm.entrypoints.openai.api_server \
    --model Qwen/Qwen3.5-397B-A17B \
    --tensor-parallel-size 4 \
    --max-num-seqs 256 \
    --trust-remote-code \
    --max-model-len 2176 \
    --no-enable-prefix-caching \
    --language-model-only \
    --async-scheduling \
    --attention-backend FLASHINFER \
    --enable-expert-parallel \
    --compilation_config.max_cudagraph_capture_size 2048 \
    --host 0.0.0.0 --port 60000

RAW_BUFFERClick to expand / collapse

Your current environment

<details> <summary>The output of <code>python collect_env.py</code></summary>

vLLM version:           0.19.1rc1.dev216+g17e787a77.cu130
Python:                 3.12
CUDA (vLLM build):      13.0
GPU:                    8xNVIDIA GB300
CPU:                    aarch64

Model:                  Qwen/Qwen3.5-397B-A17B
trust_remote_code:      True
dtype:                  bfloat16
max_model_len:          2176
tensor_parallel_size:   4
pipeline_parallel_size: 1
data_parallel_size:     1
enable_expert_parallel: True
language_model_only:    True
attention_backend:      FLASHINFER
quantization:           None (bf16)

</details>

🐛 Describe the bug

Summary

Reproduce

VLLM_USE_FLASHINFER_MOE_FP16=1 \
python3 -m vllm.entrypoints.openai.api_server \
    --model Qwen/Qwen3.5-397B-A17B \
    --tensor-parallel-size 4 \
    --max-num-seqs 256 \
    --trust-remote-code \
    --max-model-len 2176 \
    --no-enable-prefix-caching \
    --language-model-only \
    --async-scheduling \
    --attention-backend FLASHINFER \
    --enable-expert-parallel \
    --compilation_config.max_cudagraph_capture_size 2048 \
    --host 0.0.0.0 --port 60000

Error

The failing worker subprocess exits during init while resolving the tokenizer. The underlying cause is a KeyError: 'qwen3_5_moe' from transformers.CONFIG_MAPPING:

Server log

server_log.txt

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

The most likely fix is to update the transformers library to a version that includes the 'qwen3_5_moe' key in its CONFIG_MAPPING.

Guidance

Verify that the transformers library version is compatible with the Qwen/Qwen3.5-397B-A17B model by checking the library's documentation or release notes.
Check the CONFIG_MAPPING dictionary in the transformers library to confirm that the 'qwen3_5_moe' key is missing.
Consider updating the transformers library to a version that includes the 'qwen3_5_moe' key, if available.
If updating the library is not feasible, try setting trust_remote_code to False to see if the issue persists.

Example

No code snippet is provided as the issue is related to a missing key in the transformers library.

Notes

The issue may be specific to the Qwen/Qwen3.5-397B-A17B model or the transformers library version being used. Further investigation is needed to determine the root cause.

Recommendation

Apply workaround: set trust_remote_code to False to see if the issue persists, as this may help mitigate the problem until a more permanent fix is available.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #request error #file not found #serialization error #model compatibility

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - ✅(Solved) Fix [Bug]: Qwen3.5 trust_remote_code loading intermittently fails with KeyError 'qwen3_5_moe' in worker init [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

PR fix notes

PR #40299: Register parsed config classes before tokenizer init

Description (problem / solution / changelog)

Summary

Why this is not a duplicate

Tests

AI assistance

Changed files

Code Example

Your current environment

🐛 Describe the bug

Summary

Reproduce

Error

Server log

Before submitting a new issue...

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

vllm - ✅(Solved) Fix [Bug]: Qwen3.5 trust_remote_code loading intermittently fails with KeyError 'qwen3_5_moe' in worker init [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

PR fix notes

PR #40299: Register parsed config classes before tokenizer init

Description (problem / solution / changelog)

Summary

Why this is not a duplicate

Tests

AI assistance

Changed files

Code Example

Your current environment

🐛 Describe the bug

Summary

Reproduce

Error

Server log

Before submitting a new issue...

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING