vllm - ✅(Solved) Fix [Bug]: Deepseek v4 ImportError: cannot import name 'Arch' from 'cutlass.base_dsl' [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#43141Fetched 2026-05-20 03:39:42
View on GitHub
Comments
0
Participants
1
Timeline
5
Reactions
0
Author
Participants
Timeline (top)
referenced ×2closed ×1cross-referenced ×1labeled ×1

Error Message

(Worker_DP2_EP2 pid=26757) ERROR 05-19 13:32:24 [multiproc_executor.py:962] File "/vllm/vllm/models/deepseek_v4/attention.py", line 1179, in forward (Worker_DP2_EP2 pid=26757) ERROR 05-19 13:32:24 [multiproc_executor.py:962] q_quant, weights = fused_indexer_q_rope_quant( (Worker_DP2_EP2 pid=26757) ERROR 05-19 13:32:24 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP2_EP2 pid=26757) ERROR 05-19 13:32:24 [multiproc_executor.py:962] File "/vllm/vllm/models/deepseek_v4/common/ops/fused_indexer_q.py", line 349, in fused_indexer_q_rope_quant (Worker_DP2_EP2 pid=26757) ERROR 05-19 13:32:24 [multiproc_executor.py:962] from vllm.models.deepseek_v4.nvidia.ops.fused_indexer_q_cutedsl import ( (Worker_DP2_EP2 pid=26757) ERROR 05-19 13:32:24 [multiproc_executor.py:962] File "/vllm/vllm/models/deepseek_v4/nvidia/ops/fused_indexer_q_cutedsl.py", line 10, in <module> (Worker_DP2_EP2 pid=26757) ERROR 05-19 13:32:24 [multiproc_executor.py:962] from quack.compile_utils import make_fake_tensor (Worker_DP2_EP2 pid=26757) ERROR 05-19 13:32:24 [multiproc_executor.py:962] File "/vllm/.venv/lib/python3.12/site-packages/quack/init.py", line 5, in <module> (Worker_DP2_EP2 pid=26757) ERROR 05-19 13:32:24 [multiproc_executor.py:962] from quack.rmsnorm import rmsnorm (Worker_DP2_EP2 pid=26757) ERROR 05-19 13:32:24 [multiproc_executor.py:962] File "/vllm/.venv/lib/python3.12/site-packages/quack/rmsnorm.py", line 24, in <module> (Worker_DP2_EP2 pid=26757) ERROR 05-19 13:32:24 [multiproc_executor.py:962] from cutlass.base_dsl import Arch (Worker_DP2_EP2 pid=26757) ERROR 05-19 13:32:24 [multiproc_executor.py:962] ImportError: cannot import name 'Arch' from 'cutlass.base_dsl' (/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/init.py) (Worker_DP2_EP2 pid=26757) ERROR 05-19 13:32:24 [multiproc_executor.py:962]

Fix Action

Fixed

PR fix notes

PR #43146: [Bugfix] Fix

DeepSeek V4 ImportError when cutlass/quack versions are
incompatible

Description (problem / solution / changelog)

What

has_cutedsl() in vllm/utils/import_utils.py used
importlib.util.find_spec("cutlass") which only checks whether the module exists on disk, not whether it imports successfully. When quack is incompatible with the installed cutlass build (e.g. Arch removed from cutlass.base_dsl), has_cutedsl() returns True, causing the lazy import of fused_indexer_q_cutedsl to crash at runtime with:

  ImportError: cannot import name 'Arch' from 'cutlass.base_dsl'

Fix

Replace the find_spec check with a real import attempt of both cutlass and quack.compile_utils. Any exception returns False, letting callers in fused_indexer_q.py and cache_utils.py fall
back to the existing Triton path. Added @cache to avoid repeated import overhead across call sites.

Why this is not a duplicate

Searched open PRs for has_cutedsl, cutlass base_dsl Arch, and
quack importerror deepseek. No existing PR addresses this.

Tests

The CUDA kernel tests
(tests/kernels/test_fused_indexer_q_rope_quant.py) require a GPU and were not run locally. Logic was verified via
simulation:

  • cutlass not installed -> False
  • cutlass installed, quack incompatible (ImportError on Arch) -> False
  • Both installed and compatible -> True

The existing tests mock fused_indexer_q.has_cutedsl directly and are unaffected by this change.

AI assistance

This fix was developed with Claude Code assistance. All changed
lines reviewed and understood by the submitter.

Fixes #43141 EOF

Changed files

  • vllm/utils/import_utils.py (modified, +16/-2)

Code Example

Your output of `python collect_env.py` here

---

(Worker_DP2_EP2 pid=26757) ERROR 05-19 13:32:24 [multiproc_executor.py:962]   File "/vllm/vllm/models/deepseek_v4/attention.py", line 1179, in forward
(Worker_DP2_EP2 pid=26757) ERROR 05-19 13:32:24 [multiproc_executor.py:962]     q_quant, weights = fused_indexer_q_rope_quant(
(Worker_DP2_EP2 pid=26757) ERROR 05-19 13:32:24 [multiproc_executor.py:962]                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP2_EP2 pid=26757) ERROR 05-19 13:32:24 [multiproc_executor.py:962]   File "/vllm/vllm/models/deepseek_v4/common/ops/fused_indexer_q.py", line 349, in fused_indexer_q_rope_quant
(Worker_DP2_EP2 pid=26757) ERROR 05-19 13:32:24 [multiproc_executor.py:962]     from vllm.models.deepseek_v4.nvidia.ops.fused_indexer_q_cutedsl import (
(Worker_DP2_EP2 pid=26757) ERROR 05-19 13:32:24 [multiproc_executor.py:962]   File "/vllm/vllm/models/deepseek_v4/nvidia/ops/fused_indexer_q_cutedsl.py", line 10, in <module>
(Worker_DP2_EP2 pid=26757) ERROR 05-19 13:32:24 [multiproc_executor.py:962]     from quack.compile_utils import make_fake_tensor
(Worker_DP2_EP2 pid=26757) ERROR 05-19 13:32:24 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/quack/__init__.py", line 5, in <module>
(Worker_DP2_EP2 pid=26757) ERROR 05-19 13:32:24 [multiproc_executor.py:962]     from quack.rmsnorm import rmsnorm
(Worker_DP2_EP2 pid=26757) ERROR 05-19 13:32:24 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/quack/rmsnorm.py", line 24, in <module>
(Worker_DP2_EP2 pid=26757) ERROR 05-19 13:32:24 [multiproc_executor.py:962]     from cutlass.base_dsl import Arch
(Worker_DP2_EP2 pid=26757) ERROR 05-19 13:32:24 [multiproc_executor.py:962] ImportError: cannot import name 'Arch' from 'cutlass.base_dsl' (/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/__init__.py)
(Worker_DP2_EP2 pid=26757) ERROR 05-19 13:32:24 [multiproc_executor.py:962]
RAW_BUFFERClick to expand / collapse

Your current environment

<details> <summary>The output of <code>python collect_env.py</code></summary>
Your output of `python collect_env.py` here
</details>

🐛 Describe the bug

Running on top-of-tree main:

(Worker_DP2_EP2 pid=26757) ERROR 05-19 13:32:24 [multiproc_executor.py:962]   File "/vllm/vllm/models/deepseek_v4/attention.py", line 1179, in forward
(Worker_DP2_EP2 pid=26757) ERROR 05-19 13:32:24 [multiproc_executor.py:962]     q_quant, weights = fused_indexer_q_rope_quant(
(Worker_DP2_EP2 pid=26757) ERROR 05-19 13:32:24 [multiproc_executor.py:962]                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP2_EP2 pid=26757) ERROR 05-19 13:32:24 [multiproc_executor.py:962]   File "/vllm/vllm/models/deepseek_v4/common/ops/fused_indexer_q.py", line 349, in fused_indexer_q_rope_quant
(Worker_DP2_EP2 pid=26757) ERROR 05-19 13:32:24 [multiproc_executor.py:962]     from vllm.models.deepseek_v4.nvidia.ops.fused_indexer_q_cutedsl import (
(Worker_DP2_EP2 pid=26757) ERROR 05-19 13:32:24 [multiproc_executor.py:962]   File "/vllm/vllm/models/deepseek_v4/nvidia/ops/fused_indexer_q_cutedsl.py", line 10, in <module>
(Worker_DP2_EP2 pid=26757) ERROR 05-19 13:32:24 [multiproc_executor.py:962]     from quack.compile_utils import make_fake_tensor
(Worker_DP2_EP2 pid=26757) ERROR 05-19 13:32:24 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/quack/__init__.py", line 5, in <module>
(Worker_DP2_EP2 pid=26757) ERROR 05-19 13:32:24 [multiproc_executor.py:962]     from quack.rmsnorm import rmsnorm
(Worker_DP2_EP2 pid=26757) ERROR 05-19 13:32:24 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/quack/rmsnorm.py", line 24, in <module>
(Worker_DP2_EP2 pid=26757) ERROR 05-19 13:32:24 [multiproc_executor.py:962]     from cutlass.base_dsl import Arch
(Worker_DP2_EP2 pid=26757) ERROR 05-19 13:32:24 [multiproc_executor.py:962] ImportError: cannot import name 'Arch' from 'cutlass.base_dsl' (/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/__init__.py)
(Worker_DP2_EP2 pid=26757) ERROR 05-19 13:32:24 [multiproc_executor.py:962]

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING