vllm - ✅(Solved) Fix [Bug] Deepseek v4 :torch._inductor.exc.InductorError: AssertionError: [1 pull requests, 9 comments, 5 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#41106Fetched 2026-04-29 06:12:22
View on GitHub
Comments
9
Participants
5
Timeline
16
Reactions
0
Timeline (top)
commented ×9cross-referenced ×2labeled ×2renamed ×2

Error Message

WorkerProc hit an exception. (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] Traceback (most recent call last): (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 957, in worker_busy_loop (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] output = func(*args, **kwargs) (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^ (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] return func(*args, **kwargs) (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^ (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 370, in determine_available_memory (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] self.model_runner.profile_run() (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5848, in profile_run (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] hidden_states, last_hidden_states = self._dummy_run( (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^ (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] return func(*args, **kwargs) (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^ (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5537, in _dummy_run (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] outputs = self.model( (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] ^^^^^^^^^^^ (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 254, in call (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] return self.runnable(*args, **kwargs) (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] return self._call_impl(*args, **kwargs) (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1790, in _call_impl (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] return forward_call(*args, **kwargs) (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v4.py", line 1474, in forward (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] hidden_states = self.model( (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] ^^^^^^^^^^^ (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 623, in call (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] self.aot_compiled_fn = self.aot_compile(*args, **kwargs) (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 183, in aot_compile (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] return self._compiled_callable.aot_compile((args, kwargs)) (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/dynamo/eval_frame.py", line 873, in aot_compile (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] return aot_compile_fullgraph( (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/dynamo/aot_compile.py", line 368, in aot_compile_fullgraph (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] compiled_fn = backend( (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] ^^^^^^^^ (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/init.py", line 2535, in call (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] return self.compiler_fn(model, inputs, **self.kwargs) (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] File "/usr/lib/python3.12/contextlib.py", line 81, in inner (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] return func(*args, **kwds) (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^ (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 1194, in call (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] PiecewiseCompileInterpreter( (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] return func(*args, **kwargs) (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^ (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 721, in run (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] return super().run(*args) (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^ (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/fx/interpreter.py", line 200, in run (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] self.env[node] = self.run_node(node) (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^ (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/fx/interpreter.py", line 297, in run_node (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] return getattr(self, n.op)(n.target, args, kwargs) (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 748, in call_module (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] piecewise_backend = PiecewiseBackend( (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^ (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 190, in init (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] self.compile_all_ranges() (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 266, in compile_all_ranges (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] range_entry.runnable = self.vllm_backend.compiler_manager.compile( (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] return func(*args, **kwargs) (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^ (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 351, in compile (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] compiled_graph, handle = self.compiler.compile( (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/compiler_interface.py", line 372, in compile (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] compiled_graph = standalone_compile(graph, example_inputs, **compile_kwargs) (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/init.py", line 444, in standalone_compile (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] return standalone_compile( (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^ (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/standalone_compile.py", line 444, in standalone_compile (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] compiled_fn = compile_fx( (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] ^^^^^^^^^^^ (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2527, in compile_fx (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] return compile_fx( (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] ^^^^^^^^^^^ (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2578, in compile_fx (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] return _maybe_wrap_and_compile_fx_main( (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2655, in _maybe_wrap_and_compile_fx_main (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] return _compile_fx_main( (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^ (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2864, in _compile_fx_main (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1053, in _compile_fx_inner (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] raise InductorError(e, currentframe()).with_traceback( (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1037, in _compile_fx_inner (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] mb_compiled_graph = fx_codegen_and_compile( (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1798, in fx_codegen_and_compile (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1344, in codegen_and_compile (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] _recursive_post_grad_passes(gm, is_inference=is_inference) (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 583, in _recursive_post_grad_passes (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] post_grad_passes(gm, is_inference) (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/fx_passes/post_grad.py", line 357, in post_grad_passes (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] ).apply_graph_pass(decompose_triton_kernel_wrapper_functional) (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/fx/passes/graph_transform_observer.py", line 103, in apply_graph_pass (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] return pass_fn(self.gm.graph) (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/fx_passes/post_grad.py", line 1255, in decompose_triton_kernel_wrapper_functional (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] graph_pass.apply(graph) (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/pattern_matcher.py", line 2063, in apply (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] entry.apply(m, graph, node) (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/pattern_matcher.py", line 1132, in apply (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] self.handler(match, *match.args, **match.kwargs) (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/fx_passes/post_grad.py", line 1253, in _ (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] match.replace_by_example(decomp, flat_args, run_functional_passes=False) (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/pattern_matcher.py", line 316, in replace_by_example (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] assert len(graph_with_eager_vals.graph.nodes) == len( (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] torch._inductor.exc.InductorError: AssertionError:

Fix Action

Fixed

PR fix notes

PR #41135: [Bugfix] fix inductor error for dpsk v4

Description (problem / solution / changelog)

Purpose

Fix https://github.com/vllm-project/vllm/issues/41106

also see https://github.com/pytorch/pytorch/issues/181735

Test Plan

vllm serve deepseek-ai/DeepSeek-V4-Flash   --trust-remote-code   --kv-cache-dtype fp8   --block-size 256   --enable-expert-parallel   --data-parallel-size 1 --tensor-parallel-size 8   --compilation-config '{"cudagraph_mode":"FULL_AND_PIECEWISE", "custom_ops":["all"]}'   --max-num-batched-tokens 8192   --max-model-len auto   --max-num-seqs 128   --gpu-memory-utilization 0.95   --reasoning-parser deepseek_v4 --port 7888 --no-enable-flashinfer-autotune

lm_eval --model local-completions --model_args "model=deepseek-ai/DeepSeek-V4-Flash,base_url=http://0.0.0.0:7888/v1/completions,tokenized_requests=False,tokenizer_backend=None,num_concurrent=256,timeout=5000,max_length=40960" --tasks gsm8k --num_fewshot 5

Test Result

TasksVersionFiltern-shotMetricValueStderr
gsm8k3flexible-extract5exact_match0.9515±0.0059
strict-match5exact_match0.9515±0.0059

aime26: 100


<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
</details>

Changed files

  • vllm/v1/attention/ops/deepseek_v4_ops/fused_inv_rope_fp8_quant.py (modified, +106/-36)

Code Example

Your output of `python collect_env.py` here

---

WorkerProc hit an exception.
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] Traceback (most recent call last):
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 957, in worker_busy_loop
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     output = func(*args, **kwargs)
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 370, in determine_available_memory
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     self.model_runner.profile_run()
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5848, in profile_run
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     hidden_states, last_hidden_states = self._dummy_run(
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]                                         ^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5537, in _dummy_run
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     outputs = self.model(
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]               ^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 254, in __call__
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     return self.runnable(*args, **kwargs)
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     return self._call_impl(*args, **kwargs)
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1790, in _call_impl
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     return forward_call(*args, **kwargs)
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v4.py", line 1474, in forward
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     hidden_states = self.model(
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]                     ^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 623, in __call__
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     self.aot_compiled_fn = self.aot_compile(*args, **kwargs)
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 183, in aot_compile
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     return self._compiled_callable.aot_compile((args, kwargs))
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 873, in aot_compile
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     return aot_compile_fullgraph(
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/aot_compile.py", line 368, in aot_compile_fullgraph
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     compiled_fn = backend(
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]                   ^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/__init__.py", line 2535, in __call__
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     return self.compiler_fn(model_, inputs_, **self.kwargs)
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/lib/python3.12/contextlib.py", line 81, in inner
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     return func(*args, **kwds)
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 1194, in __call__
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     PiecewiseCompileInterpreter(
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 721, in run
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     return super().run(*args)
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/interpreter.py", line 200, in run
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     self.env[node] = self.run_node(node)
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]                      ^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/interpreter.py", line 297, in run_node
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     return getattr(self, n.op)(n.target, args, kwargs)
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 748, in call_module
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     piecewise_backend = PiecewiseBackend(
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]                         ^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 190, in __init__
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     self.compile_all_ranges()
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 266, in compile_all_ranges
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     range_entry.runnable = self.vllm_backend.compiler_manager.compile(
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 351, in compile
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     compiled_graph, handle = self.compiler.compile(
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]                              ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/compiler_interface.py", line 372, in compile
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     compiled_graph = standalone_compile(graph, example_inputs, **compile_kwargs)
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/__init__.py", line 444, in standalone_compile
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     return standalone_compile(
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/standalone_compile.py", line 444, in standalone_compile
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     compiled_fn = compile_fx(
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]                   ^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2527, in compile_fx
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     return compile_fx(
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]            ^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2578, in compile_fx
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     return _maybe_wrap_and_compile_fx_main(
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2655, in _maybe_wrap_and_compile_fx_main
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     return _compile_fx_main(
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2864, in _compile_fx_main
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1053, in _compile_fx_inner
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     raise InductorError(e, currentframe()).with_traceback(
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1037, in _compile_fx_inner
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     mb_compiled_graph = fx_codegen_and_compile(
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]                         ^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1798, in fx_codegen_and_compile
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1344, in codegen_and_compile
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     _recursive_post_grad_passes(gm, is_inference=is_inference)
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 583, in _recursive_post_grad_passes
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     post_grad_passes(gm, is_inference)
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/fx_passes/post_grad.py", line 357, in post_grad_passes
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     ).apply_graph_pass(decompose_triton_kernel_wrapper_functional)
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/passes/graph_transform_observer.py", line 103, in apply_graph_pass
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     return pass_fn(self.gm.graph)
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/fx_passes/post_grad.py", line 1255, in decompose_triton_kernel_wrapper_functional
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     graph_pass.apply(graph)
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/pattern_matcher.py", line 2063, in apply
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     entry.apply(m, graph, node)
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/pattern_matcher.py", line 1132, in apply
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     self.handler(match, *match.args, **match.kwargs)
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/fx_passes/post_grad.py", line 1253, in _
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     match.replace_by_example(decomp, flat_args, run_functional_passes=False)
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/pattern_matcher.py", line 316, in replace_by_example
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     assert len(graph_with_eager_vals.graph.nodes) == len(
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] torch._inductor.exc.InductorError: AssertionError:

---

vllm serve /mnt/model/DeepSeek-V4-Flash   --trust-remote-code   --kv-cache-dtype fp8   --block-size 256   --enable-expert-parallel   --data-parallel-size 1 --tensor-parallel-size 8   --compilation-config '{"cudagraph_mode":"FULL_AND_PIECEWISE", "custom_ops":["all"]}'   --max-num-batched-tokens 8192   --max-model-len auto   --max-num-seqs 128   --gpu-memory-utilization 0.95   --reasoning-parser deepseek_v4 --port 7888
RAW_BUFFERClick to expand / collapse

Your current environment

<details> <summary>The output of <code>python collect_env.py</code></summary>
Your output of `python collect_env.py` here
</details>

🐛 Describe the bug

When launching deepseek v4 with tp8 on H100, encountered the following error:

WorkerProc hit an exception.
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] Traceback (most recent call last):
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 957, in worker_busy_loop
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     output = func(*args, **kwargs)
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 370, in determine_available_memory
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     self.model_runner.profile_run()
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5848, in profile_run
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     hidden_states, last_hidden_states = self._dummy_run(
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]                                         ^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5537, in _dummy_run
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     outputs = self.model(
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]               ^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 254, in __call__
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     return self.runnable(*args, **kwargs)
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     return self._call_impl(*args, **kwargs)
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1790, in _call_impl
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     return forward_call(*args, **kwargs)
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v4.py", line 1474, in forward
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     hidden_states = self.model(
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]                     ^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 623, in __call__
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     self.aot_compiled_fn = self.aot_compile(*args, **kwargs)
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 183, in aot_compile
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     return self._compiled_callable.aot_compile((args, kwargs))
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 873, in aot_compile
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     return aot_compile_fullgraph(
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/aot_compile.py", line 368, in aot_compile_fullgraph
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     compiled_fn = backend(
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]                   ^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/__init__.py", line 2535, in __call__
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     return self.compiler_fn(model_, inputs_, **self.kwargs)
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/lib/python3.12/contextlib.py", line 81, in inner
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     return func(*args, **kwds)
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 1194, in __call__
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     PiecewiseCompileInterpreter(
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 721, in run
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     return super().run(*args)
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/interpreter.py", line 200, in run
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     self.env[node] = self.run_node(node)
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]                      ^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/interpreter.py", line 297, in run_node
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     return getattr(self, n.op)(n.target, args, kwargs)
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 748, in call_module
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     piecewise_backend = PiecewiseBackend(
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]                         ^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 190, in __init__
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     self.compile_all_ranges()
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 266, in compile_all_ranges
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     range_entry.runnable = self.vllm_backend.compiler_manager.compile(
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 351, in compile
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     compiled_graph, handle = self.compiler.compile(
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]                              ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/compiler_interface.py", line 372, in compile
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     compiled_graph = standalone_compile(graph, example_inputs, **compile_kwargs)
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/__init__.py", line 444, in standalone_compile
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     return standalone_compile(
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/standalone_compile.py", line 444, in standalone_compile
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     compiled_fn = compile_fx(
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]                   ^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2527, in compile_fx
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     return compile_fx(
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]            ^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2578, in compile_fx
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     return _maybe_wrap_and_compile_fx_main(
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2655, in _maybe_wrap_and_compile_fx_main
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     return _compile_fx_main(
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2864, in _compile_fx_main
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1053, in _compile_fx_inner
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     raise InductorError(e, currentframe()).with_traceback(
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1037, in _compile_fx_inner
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     mb_compiled_graph = fx_codegen_and_compile(
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]                         ^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1798, in fx_codegen_and_compile
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1344, in codegen_and_compile
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     _recursive_post_grad_passes(gm, is_inference=is_inference)
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 583, in _recursive_post_grad_passes
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     post_grad_passes(gm, is_inference)
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/fx_passes/post_grad.py", line 357, in post_grad_passes
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     ).apply_graph_pass(decompose_triton_kernel_wrapper_functional)
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/passes/graph_transform_observer.py", line 103, in apply_graph_pass
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     return pass_fn(self.gm.graph)
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/fx_passes/post_grad.py", line 1255, in decompose_triton_kernel_wrapper_functional
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     graph_pass.apply(graph)
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/pattern_matcher.py", line 2063, in apply
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     entry.apply(m, graph, node)
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/pattern_matcher.py", line 1132, in apply
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     self.handler(match, *match.args, **match.kwargs)
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/fx_passes/post_grad.py", line 1253, in _
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     match.replace_by_example(decomp, flat_args, run_functional_passes=False)
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/pattern_matcher.py", line 316, in replace_by_example
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]     assert len(graph_with_eager_vals.graph.nodes) == len(
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6_EP6 pid=13431) ERROR 04-28 06:25:58 [multiproc_executor.py:962] torch._inductor.exc.InductorError: AssertionError:

launch command:

vllm serve /mnt/model/DeepSeek-V4-Flash   --trust-remote-code   --kv-cache-dtype fp8   --block-size 256   --enable-expert-parallel   --data-parallel-size 1 --tensor-parallel-size 8   --compilation-config '{"cudagraph_mode":"FULL_AND_PIECEWISE", "custom_ops":["all"]}'   --max-num-batched-tokens 8192   --max-model-len auto   --max-num-seqs 128   --gpu-memory-utilization 0.95   --reasoning-parser deepseek_v4 --port 7888

image env: vllm/vllm-openai:v0.20.0-cu130

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

The most likely fix or workaround is to adjust the compilation configuration or update the vllm version to a more stable release.

Guidance

  • Review the launch command and compilation configuration to ensure that the settings are compatible with the vllm version and the model being used.
  • Consider updating the vllm version to a more stable release, as the current version v0.20.0-cu130 may have known issues or bugs.
  • Check the documentation for vllm to see if there are any specific recommendations for running DeepSeek-V4-Flash with the specified configuration.
  • Verify that the GPU memory utilization is not exceeding the specified limit of 0.95, as this could cause issues with the model compilation.

Example

No specific code example is provided, as the issue appears to be related to the configuration and version of vllm being used.

Notes

The provided error message is quite lengthy and complex, making it difficult to pinpoint the exact cause of the issue. However, the fact that it's an InductorError suggests that there may be an issue with the model compilation or the vllm version being used.

Recommendation

Apply a workaround by adjusting the compilation configuration or updating the vllm version to a more stable release. This may involve changing the cudagraph_mode or custom_ops settings, or updating to a newer version of vllm that has fixed known issues with DeepSeek-V4-Flash.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - ✅(Solved) Fix [Bug] Deepseek v4 :torch._inductor.exc.InductorError: AssertionError: [1 pull requests, 9 comments, 5 participants]