vllm - 💡(How to fix) Fix [Bug]: AttributeError: module 'cutlass.cute.arch' has no attribute 'fmin'

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] Traceback (most recent call last): (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/vllm/v1/executor/multiproc_executor.py", line 957, in worker_busy_loop (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] output = func(*args, **kwargs) (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^ (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] return func(*args, **kwargs) (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^ (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/vllm/v1/worker/gpu_worker.py", line 411, in determine_available_memory (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] cudagraph_memory_estimate = self.model_runner.profile_cudagraph_memory() (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] return func(*args, **kwargs) (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^ (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/vllm/v1/worker/gpu_model_runner.py", line 6311, in profile_cudagraph_memory (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] self._warmup_and_capture( (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/vllm/v1/worker/gpu_model_runner.py", line 6475, in _warmup_and_capture (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] self._dummy_run( (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] return func(*args, **kwargs) (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^ (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/vllm/v1/worker/gpu_model_runner.py", line 5824, in _dummy_run (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] outputs = self.model( (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] ^^^^^^^^^^^ (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/vllm/compilation/cuda_graph.py", line 254, in call (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] return self.runnable(*args, **kwargs) (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] return self._call_impl(*args, **kwargs) (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] return forward_call(*args, **kwargs) (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/vllm/models/deepseek_v4/nvidia/model.py", line 1662, in forward (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] hidden_states = self.model( (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] ^^^^^^^^^^^ (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/vllm/compilation/decorators.py", line 520, in call (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] return self.aot_compiled_fn(self, *args, **kwargs) (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/.venv/lib/python3.12/site-packages/torch/_dynamo/aot_compile.py", line 224, in call (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] return self.fn(*args, **kwargs) (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/vllm/models/deepseek_v4/nvidia/model.py", line 1347, in forward (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] def forward( (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/vllm/compilation/caching.py", line 217, in call (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] return self.optimized_call(*args, **kwargs) (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "<string>", line 259, in execution_fn (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "<string>", line 5, in __vllm_inlined_submods__1 (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/.venv/lib/python3.12/site-packages/torch/_ops.py", line 1269, in call (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] return self._op(*args, **kwargs) (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/vllm/models/deepseek_v4/attention.py", line 554, in deepseek_v4_attention (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] self.attention_impl(hidden_states, positions, out) (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/vllm/models/deepseek_v4/attention.py", line 481, in attention_impl (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] q, _ = maybe_execute_in_parallel( (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/vllm/utils/multi_stream_utils.py", line 52, in maybe_execute_in_parallel (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] result1 = fn1() (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] ^^^^^ (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/vllm/models/deepseek_v4/attention.py", line 483, in <lambda> (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] lambda: compressor(kv_score, positions, self.rotary_emb), (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] return self._call_impl(*args, **kwargs) (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] return forward_call(*args, **kwargs) (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/vllm/models/deepseek_v4/compressor.py", line 396, in forward (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] self._norm_rope_store_kernel( (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/vllm/models/deepseek_v4/common/ops/fused_compress_quant_cache.py", line 57, in _norm_rope_insert_sparse_attn_cutedsl (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] return _get_sparse_attn_cutedsl_impls()[1](*args, **kwargs) (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/vllm/models/deepseek_v4/common/ops/sparse_attn_compress_cutedsl.py", line 1153, in _norm_rope_insert_sparse_attn_cutedsl (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] compiled = SparseAttnNormRopeStoreKernel.compile( (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/vllm/models/deepseek_v4/common/ops/sparse_attn_compress_cutedsl.py", line 1073, in compile (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] return cute.compile( (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] ^^^^^^^^^^^^^ (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/compiler.py", line 582, in call (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] return self._compile(*args, **kwargs) (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/compiler.py", line 661, in _compile (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] return func._dsl_object._func(func, *args, **kwargs) (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/dsl.py", line 2214, in _func (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] result = self.generate_mlir( (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^ (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/dsl.py", line 1835, in generate_mlir (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] module, module_hash, result = self.generate_original_ir( (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/dsl.py", line 1599, in generate_original_ir (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] module, result = build_ir_module() (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^ (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/dsl.py", line 1575, in build_ir_module (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] result = funcBody(*ir_args, **ir_kwargs) (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/vllm/models/deepseek_v4/common/ops/sparse_attn_compress_cutedsl.py", line 867, in call (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] ).launch(grid=grid, block=(self.tb_size, 1, 1), stream=stream) (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/cutlass_dsl/cutlass.py", line 1372, in launch (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] ret, name = kernel_generator(*self.func_args, **self.func_kwargs, config=config) (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/dsl.py", line 2538, in kernel_wrapper (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] kernel_ret = funcBody(*ir_args, **ir_kwargs) (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/vllm/models/deepseek_v4/common/ops/sparse_attn_compress_cutedsl.py", line 902, in kernel (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] if active: (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/ast_helpers.py", line 270, in ir_loop (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] return func(pred, *write_args) (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/vllm/models/deepseek_v4/common/ops/sparse_attn_compress_cutedsl.py", line 902, in if_region_4 (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] if active: (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/ast_helpers.py", line 306, in if_executor (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] return executor.if_execute( (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^ (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/ast_helpers.py", line 155, in if_execute (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] return self._if_dynamic( (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^ (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/cutlass_dsl/cutlass_ast_decorators.py", line 508, in _if_execute_dynamic (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] return scf_gen.scf_execute_dynamic( (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/cutlass_dsl/cutlass_ast_decorators.py", line 197, in scf_execute_dynamic (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] region_result = builder( (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] ^^^^^^^^ (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/cutlass_dsl/cutlass_ast_decorators.py", line 485, in then_builder (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] return then_block(*flat_args) (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/vllm/models/deepseek_v4/common/ops/sparse_attn_compress_cutedsl.py", line 948, in then_block_15 (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] if warp_id == self.nope_blocks: (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/ast_helpers.py", line 270, in ir_loop (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] return func(pred, *write_args) (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/vllm/models/deepseek_v4/common/ops/sparse_attn_compress_cutedsl.py", line 948, in if_region_9 (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] if warp_id == self.nope_blocks: (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/ast_helpers.py", line 306, in if_executor (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] return executor.if_execute( (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^ (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/ast_helpers.py", line 155, in if_execute (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] return self._if_dynamic( (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^ (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/cutlass_dsl/cutlass_ast_decorators.py", line 508, in _if_execute_dynamic (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] return scf_gen.scf_execute_dynamic( (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/cutlass_dsl/cutlass_ast_decorators.py", line 197, in scf_execute_dynamic (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] region_result = builder( (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] ^^^^^^^^ (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/cutlass_dsl/cutlass_ast_decorators.py", line 504, in else_builder (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] return else_block(*flat_args) (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] File "/vllm/vllm/models/deepseek_v4/common/ops/sparse_attn_compress_cutedsl.py", line 980, in else_block_10 (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] y0 = cute.arch.fmin( (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] ^^^^^^^^^^^^^^ (Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] AttributeError: module 'cutlass.cute.arch' has no attribute 'fmin'

Code Example

Your output of `python collect_env.py` here

---

(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] Traceback (most recent call last):
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/vllm/v1/executor/multiproc_executor.py", line 957, in worker_busy_loop
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     output = func(*args, **kwargs)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/vllm/v1/worker/gpu_worker.py", line 411, in determine_available_memory
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     cudagraph_memory_estimate = self.model_runner.profile_cudagraph_memory()
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/vllm/v1/worker/gpu_model_runner.py", line 6311, in profile_cudagraph_memory
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     self._warmup_and_capture(
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/vllm/v1/worker/gpu_model_runner.py", line 6475, in _warmup_and_capture
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     self._dummy_run(
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/vllm/v1/worker/gpu_model_runner.py", line 5824, in _dummy_run
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     outputs = self.model(
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]               ^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/vllm/compilation/cuda_graph.py", line 254, in __call__
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return self.runnable(*args, **kwargs)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return self._call_impl(*args, **kwargs)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return forward_call(*args, **kwargs)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/vllm/models/deepseek_v4/nvidia/model.py", line 1662, in forward
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     hidden_states = self.model(
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]                     ^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/vllm/compilation/decorators.py", line 520, in __call__
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return self.aot_compiled_fn(self, *args, **kwargs)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/torch/_dynamo/aot_compile.py", line 224, in __call__
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return self.fn(*args, **kwargs)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/vllm/models/deepseek_v4/nvidia/model.py", line 1347, in forward
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     def forward(
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/vllm/compilation/caching.py", line 217, in __call__
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return self.optimized_call(*args, **kwargs)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "<string>", line 259, in execution_fn
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "<string>", line 5, in __vllm_inlined_submods__1
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/torch/_ops.py", line 1269, in __call__
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return self._op(*args, **kwargs)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/vllm/models/deepseek_v4/attention.py", line 554, in deepseek_v4_attention
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     self.attention_impl(hidden_states, positions, out)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/vllm/models/deepseek_v4/attention.py", line 481, in attention_impl
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     q, _ = maybe_execute_in_parallel(
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/vllm/utils/multi_stream_utils.py", line 52, in maybe_execute_in_parallel
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     result1 = fn1()
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]               ^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/vllm/models/deepseek_v4/attention.py", line 483, in <lambda>
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     lambda: compressor(kv_score, positions, self.rotary_emb),
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return self._call_impl(*args, **kwargs)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return forward_call(*args, **kwargs)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/vllm/models/deepseek_v4/compressor.py", line 396, in forward
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     self._norm_rope_store_kernel(
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/vllm/models/deepseek_v4/common/ops/fused_compress_quant_cache.py", line 57, in _norm_rope_insert_sparse_attn_cutedsl
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return _get_sparse_attn_cutedsl_impls()[1](*args, **kwargs)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/vllm/models/deepseek_v4/common/ops/sparse_attn_compress_cutedsl.py", line 1153, in _norm_rope_insert_sparse_attn_cutedsl
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     compiled = SparseAttnNormRopeStoreKernel.compile(
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/vllm/models/deepseek_v4/common/ops/sparse_attn_compress_cutedsl.py", line 1073, in compile
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return cute.compile(
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/compiler.py", line 582, in __call__
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return self._compile(*args, **kwargs)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/compiler.py", line 661, in _compile
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return func._dsl_object._func(func, *args, **kwargs)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/dsl.py", line 2214, in _func
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     result = self.generate_mlir(
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]              ^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/dsl.py", line 1835, in generate_mlir
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     module, module_hash, result = self.generate_original_ir(
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/dsl.py", line 1599, in generate_original_ir
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     module, result = build_ir_module()
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]                      ^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/dsl.py", line 1575, in build_ir_module
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     result = funcBody(*ir_args, **ir_kwargs)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/vllm/models/deepseek_v4/common/ops/sparse_attn_compress_cutedsl.py", line 867, in __call__
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     ).launch(grid=grid, block=(self.tb_size, 1, 1), stream=stream)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/cutlass_dsl/cutlass.py", line 1372, in launch
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     ret, name = kernel_generator(*self.func_args, **self.func_kwargs, config=config)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/dsl.py", line 2538, in kernel_wrapper
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     kernel_ret = funcBody(*ir_args, **ir_kwargs)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/vllm/models/deepseek_v4/common/ops/sparse_attn_compress_cutedsl.py", line 902, in kernel
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     if active:
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/ast_helpers.py", line 270, in ir_loop
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return func(pred, *write_args)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/vllm/models/deepseek_v4/common/ops/sparse_attn_compress_cutedsl.py", line 902, in if_region_4
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     if active:
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/ast_helpers.py", line 306, in if_executor
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return executor.if_execute(
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/ast_helpers.py", line 155, in if_execute
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return self._if_dynamic(
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/cutlass_dsl/cutlass_ast_decorators.py", line 508, in _if_execute_dynamic
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return scf_gen.scf_execute_dynamic(
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/cutlass_dsl/cutlass_ast_decorators.py", line 197, in scf_execute_dynamic
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     region_result = builder(
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]                     ^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/cutlass_dsl/cutlass_ast_decorators.py", line 485, in then_builder
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return then_block(*flat_args)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/vllm/models/deepseek_v4/common/ops/sparse_attn_compress_cutedsl.py", line 948, in then_block_15
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     if warp_id == self.nope_blocks:
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/ast_helpers.py", line 270, in ir_loop
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return func(pred, *write_args)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/vllm/models/deepseek_v4/common/ops/sparse_attn_compress_cutedsl.py", line 948, in if_region_9
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     if warp_id == self.nope_blocks:
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/ast_helpers.py", line 306, in if_executor
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return executor.if_execute(
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/ast_helpers.py", line 155, in if_execute
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return self._if_dynamic(
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/cutlass_dsl/cutlass_ast_decorators.py", line 508, in _if_execute_dynamic
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return scf_gen.scf_execute_dynamic(
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/cutlass_dsl/cutlass_ast_decorators.py", line 197, in scf_execute_dynamic
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     region_result = builder(
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]                     ^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/cutlass_dsl/cutlass_ast_decorators.py", line 504, in else_builder
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return else_block(*flat_args)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/vllm/models/deepseek_v4/common/ops/sparse_attn_compress_cutedsl.py", line 980, in else_block_10
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     y0 = cute.arch.fmin(
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]      ^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] AttributeError: module 'cutlass.cute.arch' has no attribute 'fmin'
RAW_BUFFERClick to expand / collapse

Your current environment

<details> <summary>The output of <code>python collect_env.py</code></summary>
Your output of `python collect_env.py` here
</details>

🐛 Describe the bug

(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] Traceback (most recent call last):
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/vllm/v1/executor/multiproc_executor.py", line 957, in worker_busy_loop
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     output = func(*args, **kwargs)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/vllm/v1/worker/gpu_worker.py", line 411, in determine_available_memory
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     cudagraph_memory_estimate = self.model_runner.profile_cudagraph_memory()
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/vllm/v1/worker/gpu_model_runner.py", line 6311, in profile_cudagraph_memory
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     self._warmup_and_capture(
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/vllm/v1/worker/gpu_model_runner.py", line 6475, in _warmup_and_capture
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     self._dummy_run(
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/vllm/v1/worker/gpu_model_runner.py", line 5824, in _dummy_run
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     outputs = self.model(
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]               ^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/vllm/compilation/cuda_graph.py", line 254, in __call__
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return self.runnable(*args, **kwargs)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return self._call_impl(*args, **kwargs)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return forward_call(*args, **kwargs)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/vllm/models/deepseek_v4/nvidia/model.py", line 1662, in forward
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     hidden_states = self.model(
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]                     ^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/vllm/compilation/decorators.py", line 520, in __call__
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return self.aot_compiled_fn(self, *args, **kwargs)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/torch/_dynamo/aot_compile.py", line 224, in __call__
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return self.fn(*args, **kwargs)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/vllm/models/deepseek_v4/nvidia/model.py", line 1347, in forward
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     def forward(
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/vllm/compilation/caching.py", line 217, in __call__
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return self.optimized_call(*args, **kwargs)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "<string>", line 259, in execution_fn
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "<string>", line 5, in __vllm_inlined_submods__1
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/torch/_ops.py", line 1269, in __call__
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return self._op(*args, **kwargs)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/vllm/models/deepseek_v4/attention.py", line 554, in deepseek_v4_attention
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     self.attention_impl(hidden_states, positions, out)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/vllm/models/deepseek_v4/attention.py", line 481, in attention_impl
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     q, _ = maybe_execute_in_parallel(
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/vllm/utils/multi_stream_utils.py", line 52, in maybe_execute_in_parallel
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     result1 = fn1()
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]               ^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/vllm/models/deepseek_v4/attention.py", line 483, in <lambda>
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     lambda: compressor(kv_score, positions, self.rotary_emb),
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return self._call_impl(*args, **kwargs)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return forward_call(*args, **kwargs)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/vllm/models/deepseek_v4/compressor.py", line 396, in forward
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     self._norm_rope_store_kernel(
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/vllm/models/deepseek_v4/common/ops/fused_compress_quant_cache.py", line 57, in _norm_rope_insert_sparse_attn_cutedsl
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return _get_sparse_attn_cutedsl_impls()[1](*args, **kwargs)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/vllm/models/deepseek_v4/common/ops/sparse_attn_compress_cutedsl.py", line 1153, in _norm_rope_insert_sparse_attn_cutedsl
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     compiled = SparseAttnNormRopeStoreKernel.compile(
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/vllm/models/deepseek_v4/common/ops/sparse_attn_compress_cutedsl.py", line 1073, in compile
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return cute.compile(
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/compiler.py", line 582, in __call__
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return self._compile(*args, **kwargs)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/compiler.py", line 661, in _compile
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return func._dsl_object._func(func, *args, **kwargs)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/dsl.py", line 2214, in _func
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     result = self.generate_mlir(
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]              ^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/dsl.py", line 1835, in generate_mlir
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     module, module_hash, result = self.generate_original_ir(
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/dsl.py", line 1599, in generate_original_ir
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     module, result = build_ir_module()
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]                      ^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/dsl.py", line 1575, in build_ir_module
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     result = funcBody(*ir_args, **ir_kwargs)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/vllm/models/deepseek_v4/common/ops/sparse_attn_compress_cutedsl.py", line 867, in __call__
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     ).launch(grid=grid, block=(self.tb_size, 1, 1), stream=stream)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/cutlass_dsl/cutlass.py", line 1372, in launch
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     ret, name = kernel_generator(*self.func_args, **self.func_kwargs, config=config)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/dsl.py", line 2538, in kernel_wrapper
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     kernel_ret = funcBody(*ir_args, **ir_kwargs)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/vllm/models/deepseek_v4/common/ops/sparse_attn_compress_cutedsl.py", line 902, in kernel
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     if active:
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/ast_helpers.py", line 270, in ir_loop
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return func(pred, *write_args)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/vllm/models/deepseek_v4/common/ops/sparse_attn_compress_cutedsl.py", line 902, in if_region_4
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     if active:
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/ast_helpers.py", line 306, in if_executor
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return executor.if_execute(
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/ast_helpers.py", line 155, in if_execute
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return self._if_dynamic(
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/cutlass_dsl/cutlass_ast_decorators.py", line 508, in _if_execute_dynamic
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return scf_gen.scf_execute_dynamic(
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/cutlass_dsl/cutlass_ast_decorators.py", line 197, in scf_execute_dynamic
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     region_result = builder(
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]                     ^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/cutlass_dsl/cutlass_ast_decorators.py", line 485, in then_builder
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return then_block(*flat_args)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/vllm/models/deepseek_v4/common/ops/sparse_attn_compress_cutedsl.py", line 948, in then_block_15
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     if warp_id == self.nope_blocks:
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/ast_helpers.py", line 270, in ir_loop
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return func(pred, *write_args)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/vllm/models/deepseek_v4/common/ops/sparse_attn_compress_cutedsl.py", line 948, in if_region_9
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     if warp_id == self.nope_blocks:
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/ast_helpers.py", line 306, in if_executor
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return executor.if_execute(
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/base_dsl/ast_helpers.py", line 155, in if_execute
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return self._if_dynamic(
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/cutlass_dsl/cutlass_ast_decorators.py", line 508, in _if_execute_dynamic
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return scf_gen.scf_execute_dynamic(
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/cutlass_dsl/cutlass_ast_decorators.py", line 197, in scf_execute_dynamic
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     region_result = builder(
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]                     ^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/.venv/lib/python3.12/site-packages/nvidia_cutlass_dsl/python_packages/cutlass/cutlass_dsl/cutlass_ast_decorators.py", line 504, in else_builder
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     return else_block(*flat_args)
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]   File "/vllm/vllm/models/deepseek_v4/common/ops/sparse_attn_compress_cutedsl.py", line 980, in else_block_10
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]     y0 = cute.arch.fmin(
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962]      ^^^^^^^^^^^^^^
(Worker_DP4_EP4 pid=2635016) ERROR 05-26 18:29:06 [multiproc_executor.py:962] AttributeError: module 'cutlass.cute.arch' has no attribute 'fmin'

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - 💡(How to fix) Fix [Bug]: AttributeError: module 'cutlass.cute.arch' has no attribute 'fmin'