vllm - 💡(How to fix) Fix [Bug]: AssertionError: auto_functionalized was not removed

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

Traceback (most recent call last): File "/root/.local/share/uv/python/cpython-3.13.13-linux-x86_64-gnu/lib/python3.13/multiprocessing/process.py", line 313, in _bootstrap self.run() ~~~~~~~~^^ File "/root/.local/share/uv/python/cpython-3.13.13-linux-x86_64-gnu/lib/python3.13/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.venv/lib/python3.13/site-packages/vllm/v1/engine/core.py", line 1140, in run_engine_core raise e File "/root/.venv/lib/python3.13/site-packages/vllm/v1/engine/core.py", line 1110, in run_engine_core engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs) File "/root/.venv/lib/python3.13/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper return func(*args, **kwargs) File "/root/.venv/lib/python3.13/site-packages/vllm/v1/engine/core.py", line 876, in init super().init( ~~~~~~~~~~~~~~~~^ vllm_config, ^^^^^^^^^^^^ ...<3 lines>... internal_dp_balancing, ^^^^^^^^^^^^^^^^^^^^^^ ) ^ File "/root/.venv/lib/python3.13/site-packages/vllm/v1/engine/core.py", line 128, in init kv_cache_config = self._initialize_kv_caches(vllm_config) File "/root/.venv/lib/python3.13/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper return func(*args, **kwargs) File "/root/.venv/lib/python3.13/site-packages/vllm/v1/engine/core.py", line 250, in _initialize_kv_caches available_gpu_memory = self.model_executor.determine_available_memory() File "/root/.venv/lib/python3.13/site-packages/vllm/v1/executor/abstract.py", line 147, in determine_available_memory return self.collective_rpc("determine_available_memory") ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.venv/lib/python3.13/site-packages/vllm/v1/executor/uniproc_executor.py", line 80, in collective_rpc result = run_method(self.driver_worker, method, args, kwargs) File "/root/.venv/lib/python3.13/site-packages/vllm/v1/serial_utils.py", line 510, in run_method return func(*args, **kwargs) File "/root/.venv/lib/python3.13/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context return func(*args, **kwargs) File "/root/.venv/lib/python3.13/site-packages/vllm/v1/worker/gpu_worker.py", line 370, in determine_available_memory self.model_runner.profile_run() ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^ File "/root/.venv/lib/python3.13/site-packages/vllm/v1/worker/gpu_model_runner.py", line 5848, in profile_run hidden_states, last_hidden_states = self._dummy_run( ~~~~~~~~~~~~~~~^ self.max_num_tokens, is_profile=True ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ) ^ File "/root/.venv/lib/python3.13/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context return func(*args, **kwargs) File "/root/.venv/lib/python3.13/site-packages/vllm/v1/worker/gpu_model_runner.py", line 5537, in _dummy_run outputs = self.model( input_ids=input_ids, ...<3 lines>... **model_kwargs, ) File "/root/.venv/lib/python3.13/site-packages/vllm/compilation/cuda_graph.py", line 254, in call return self.runnable(*args, **kwargs) ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ File "/root/.venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ File "/root/.venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl return forward_call(*args, **kwargs) File "/root/.venv/lib/python3.13/site-packages/vllm/model_executor/models/deepseek_v4.py", line 1566, in forward hidden_states = self.model( input_ids, positions, intermediate_tensors, inputs_embeds ) File "/root/.venv/lib/python3.13/site-packages/vllm/compilation/decorators.py", line 623, in call self.aot_compiled_fn = self.aot_compile(*args, **kwargs) ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ File "/root/.venv/lib/python3.13/site-packages/vllm/compilation/wrapper.py", line 183, in aot_compile return self._compiled_callable.aot_compile((args, kwargs)) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^ File "/root/.venv/lib/python3.13/site-packages/torch/_dynamo/eval_frame.py", line 873, in aot_compile return aot_compile_fullgraph( fn, ...<3 lines>... dynamic=self.dynamic, ) File "/root/.venv/lib/python3.13/site-packages/torch/dynamo/aot_compile.py", line 368, in aot_compile_fullgraph compiled_fn = backend( backend_input.graph_module, backend_input.example_inputs ) File "/root/.venv/lib/python3.13/site-packages/torch/init.py", line 2535, in call return self.compiler_fn(model, inputs, **self.kwargs) ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.local/share/uv/python/cpython-3.13.13-linux-x86_64-gnu/lib/python3.13/contextlib.py", line 85, in inner return func(*args, **kwds) File "/root/.venv/lib/python3.13/site-packages/vllm/compilation/backends.py", line 1194, in call PiecewiseCompileInterpreter( ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ self.split_gm, submod_names_to_compile, self.vllm_config, self ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ).run(*fake_args) ~~~~~^^^^^^^^^^^^ File "/root/.venv/lib/python3.13/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper return func(*args, **kwargs) File "/root/.venv/lib/python3.13/site-packages/vllm/compilation/backends.py", line 721, in run return super().run(*args) ~~~~~~~~~~~^^^^^^^ File "/root/.venv/lib/python3.13/site-packages/torch/fx/interpreter.py", line 200, in run self.env[node] = self.run_node(node) ~~~~~~~~~~~~~^^^^^^ File "/root/.venv/lib/python3.13/site-packages/torch/fx/interpreter.py", line 297, in run_node return getattr(self, n.op)(n.target, args, kwargs) ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.venv/lib/python3.13/site-packages/vllm/compilation/backends.py", line 748, in call_module piecewise_backend = PiecewiseBackend( submod, ...<6 lines>... submod_name=target, ) File "/root/.venv/lib/python3.13/site-packages/vllm/compilation/piecewise_backend.py", line 190, in init self.compile_all_ranges() ~~~~~~~~~~~~~~~~~~~~~~~^^ File "/root/.venv/lib/python3.13/site-packages/vllm/compilation/piecewise_backend.py", line 266, in compile_all_ranges range_entry.runnable = self.vllm_backend.compiler_manager.compile( ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^ self.graph, ^^^^^^^^^^^ ...<6 lines>... is_encoder=self.vllm_backend.is_encoder, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ) ^ File "/root/.venv/lib/python3.13/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper return func(*args, **kwargs) File "/root/.venv/lib/python3.13/site-packages/vllm/compilation/backends.py", line 351, in compile compiled_graph, handle = self.compiler.compile( ~~~~~~~~~~~~~~~~~~~~~^ graph, ^^^^^^ ...<3 lines>... maybe_key, ^^^^^^^^^^ ) ^ File "/root/.venv/lib/python3.13/site-packages/vllm/compilation/compiler_interface.py", line 372, in compile compiled_graph = standalone_compile(graph, example_inputs, **compile_kwargs) File "/root/.venv/lib/python3.13/site-packages/torch/_inductor/init.py", line 444, in standalone_compile return standalone_compile( gm, example_inputs, dynamic_shapes=dynamic_shapes, options=options, aot=aot ) File "/root/.venv/lib/python3.13/site-packages/torch/_inductor/standalone_compile.py", line 444, in standalone_compile compiled_fn = compile_fx( gm, example_inputs, ignore_shape_env=ignore_shape_env, **options ) File "/root/.venv/lib/python3.13/site-packages/torch/inductor/compile_fx.py", line 2527, in compile_fx return compile_fx( model, ...<4 lines>... ignore_shape_env=ignore_shape_env, ) File "/root/.venv/lib/python3.13/site-packages/torch/_inductor/compile_fx.py", line 2578, in compile_fx return maybe_wrap_and_compile_fx_main( model, ...<3 lines>... ignore_shape_env, ) File "/root/.venv/lib/python3.13/site-packages/torch/_inductor/compile_fx.py", line 2655, in _maybe_wrap_and_compile_fx_main return compile_fx_main( model, ...<3 lines>... ignore_shape_env, ) File "/root/.venv/lib/python3.13/site-packages/torch/_inductor/compile_fx.py", line 2864, in _compile_fx_main raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.venv/lib/python3.13/site-packages/torch/_inductor/compile_fx.py", line 1053, in _compile_fx_inner raise InductorError(e, currentframe()).with_traceback( e.traceback ) from None File "/root/.venv/lib/python3.13/site-packages/torch/_inductor/compile_fx.py", line 1037, in _compile_fx_inner mb_compiled_graph = fx_codegen_and_compile( gm, example_inputs, inputs_to_check, **graph_kwargs ) File "/root/.venv/lib/python3.13/site-packages/torch/_inductor/compile_fx.py", line 1798, in fx_codegen_and_compile return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) ~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.venv/lib/python3.13/site-packages/torch/_inductor/compile_fx.py", line 1344, in codegen_and_compile _recursive_post_grad_passes(gm, is_inference=is_inference) ~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.venv/lib/python3.13/site-packages/torch/_inductor/compile_fx.py", line 583, in _recursive_post_grad_passes post_grad_passes(gm, is_inference) ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^ File "/root/.venv/lib/python3.13/site-packages/torch/_inductor/fx_passes/post_grad.py", line 358, in post_grad_passes GraphTransformObserver(gm, "decompose_auto_functionalized").apply_graph_pass( ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^ decompose_auto_functionalized ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ) ^ File "/root/.venv/lib/python3.13/site-packages/torch/fx/passes/graph_transform_observer.py", line 103, in apply_graph_pass return pass_fn(self.gm.graph) File "/root/.venv/lib/python3.13/site-packages/torch/_inductor/fx_passes/post_grad.py", line 1392, in decompose_auto_functionalized raise AssertionError("auto_functionalized was not removed") torch._inductor.exc.InductorError: AssertionError: auto_functionalized was not removed [rank0]:[W512 19:34:15.728822567 ProcessGroupNCCL.cpp:1575] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())

Code Example

Your output of `python collect_env.py` here

---

Traceback (most recent call last):
   File "/root/.local/share/uv/python/cpython-3.13.13-linux-x86_64-gnu/lib/python3.13/multiprocessing/process.py", line 313, in _bootstrap
     self.run()
     ~~~~~~~~^^
   File "/root/.local/share/uv/python/cpython-3.13.13-linux-x86_64-gnu/lib/python3.13/multiprocessing/process.py", line 108, in run
     self._target(*self._args, **self._kwargs)
     ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   File "/root/.venv/lib/python3.13/site-packages/vllm/v1/engine/core.py", line 1140, in run_engine_core
     raise e
   File "/root/.venv/lib/python3.13/site-packages/vllm/v1/engine/core.py", line 1110, in run_engine_core
     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
   File "/root/.venv/lib/python3.13/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
     return func(*args, **kwargs)
   File "/root/.venv/lib/python3.13/site-packages/vllm/v1/engine/core.py", line 876, in __init__
     super().__init__(
     ~~~~~~~~~~~~~~~~^
         vllm_config,
         ^^^^^^^^^^^^
     ...<3 lines>...
         internal_dp_balancing,
         ^^^^^^^^^^^^^^^^^^^^^^
     )
     ^
   File "/root/.venv/lib/python3.13/site-packages/vllm/v1/engine/core.py", line 128, in __init__
     kv_cache_config = self._initialize_kv_caches(vllm_config)
   File "/root/.venv/lib/python3.13/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
     return func(*args, **kwargs)
   File "/root/.venv/lib/python3.13/site-packages/vllm/v1/engine/core.py", line 250, in _initialize_kv_caches
     available_gpu_memory = self.model_executor.determine_available_memory()
   File "/root/.venv/lib/python3.13/site-packages/vllm/v1/executor/abstract.py", line 147, in determine_available_memory
     return self.collective_rpc("determine_available_memory")
            ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   File "/root/.venv/lib/python3.13/site-packages/vllm/v1/executor/uniproc_executor.py", line 80, in collective_rpc
     result = run_method(self.driver_worker, method, args, kwargs)
   File "/root/.venv/lib/python3.13/site-packages/vllm/v1/serial_utils.py", line 510, in run_method
     return func(*args, **kwargs)
   File "/root/.venv/lib/python3.13/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
     return func(*args, **kwargs)
   File "/root/.venv/lib/python3.13/site-packages/vllm/v1/worker/gpu_worker.py", line 370, in determine_available_memory
     self.model_runner.profile_run()
     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
   File "/root/.venv/lib/python3.13/site-packages/vllm/v1/worker/gpu_model_runner.py", line 5848, in profile_run
     hidden_states, last_hidden_states = self._dummy_run(
                                         ~~~~~~~~~~~~~~~^
         self.max_num_tokens, is_profile=True
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     )
     ^
   File "/root/.venv/lib/python3.13/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
     return func(*args, **kwargs)
   File "/root/.venv/lib/python3.13/site-packages/vllm/v1/worker/gpu_model_runner.py", line 5537, in _dummy_run
     outputs = self.model(
         input_ids=input_ids,
     ...<3 lines>...
         **model_kwargs,
     )
   File "/root/.venv/lib/python3.13/site-packages/vllm/compilation/cuda_graph.py", line 254, in __call__
     return self.runnable(*args, **kwargs)
            ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
   File "/root/.venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
     return self._call_impl(*args, **kwargs)
            ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
   File "/root/.venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
     return forward_call(*args, **kwargs)
   File "/root/.venv/lib/python3.13/site-packages/vllm/model_executor/models/deepseek_v4.py", line 1566, in forward
     hidden_states = self.model(
         input_ids, positions, intermediate_tensors, inputs_embeds
     )
   File "/root/.venv/lib/python3.13/site-packages/vllm/compilation/decorators.py", line 623, in __call__
     self.aot_compiled_fn = self.aot_compile(*args, **kwargs)
                            ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
   File "/root/.venv/lib/python3.13/site-packages/vllm/compilation/wrapper.py", line 183, in aot_compile
     return self._compiled_callable.aot_compile((args, kwargs))
            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
   File "/root/.venv/lib/python3.13/site-packages/torch/_dynamo/eval_frame.py", line 873, in aot_compile
     return aot_compile_fullgraph(
         fn,
     ...<3 lines>...
         dynamic=self._dynamic,
     )
   File "/root/.venv/lib/python3.13/site-packages/torch/_dynamo/aot_compile.py", line 368, in aot_compile_fullgraph
     compiled_fn = backend(
         backend_input.graph_module, backend_input.example_inputs
     )
   File "/root/.venv/lib/python3.13/site-packages/torch/__init__.py", line 2535, in __call__
     return self.compiler_fn(model_, inputs_, **self.kwargs)
            ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   File "/root/.local/share/uv/python/cpython-3.13.13-linux-x86_64-gnu/lib/python3.13/contextlib.py", line 85, in inner
     return func(*args, **kwds)
   File "/root/.venv/lib/python3.13/site-packages/vllm/compilation/backends.py", line 1194, in __call__
     PiecewiseCompileInterpreter(
     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         self.split_gm, submod_names_to_compile, self.vllm_config, self
         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     ).run(*fake_args)
     ~~~~~^^^^^^^^^^^^
   File "/root/.venv/lib/python3.13/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
     return func(*args, **kwargs)
   File "/root/.venv/lib/python3.13/site-packages/vllm/compilation/backends.py", line 721, in run
     return super().run(*args)
            ~~~~~~~~~~~^^^^^^^
   File "/root/.venv/lib/python3.13/site-packages/torch/fx/interpreter.py", line 200, in run
     self.env[node] = self.run_node(node)
                      ~~~~~~~~~~~~~^^^^^^
   File "/root/.venv/lib/python3.13/site-packages/torch/fx/interpreter.py", line 297, in run_node
     return getattr(self, n.op)(n.target, args, kwargs)
            ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^
   File "/root/.venv/lib/python3.13/site-packages/vllm/compilation/backends.py", line 748, in call_module
     piecewise_backend = PiecewiseBackend(
         submod,
     ...<6 lines>...
         submod_name=target,
     )
   File "/root/.venv/lib/python3.13/site-packages/vllm/compilation/piecewise_backend.py", line 190, in __init__
     self.compile_all_ranges()
     ~~~~~~~~~~~~~~~~~~~~~~~^^
   File "/root/.venv/lib/python3.13/site-packages/vllm/compilation/piecewise_backend.py", line 266, in compile_all_ranges
     range_entry.runnable = self.vllm_backend.compiler_manager.compile(
                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
         self.graph,
         ^^^^^^^^^^^
     ...<6 lines>...
         is_encoder=self.vllm_backend.is_encoder,
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     )
     ^
   File "/root/.venv/lib/python3.13/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
     return func(*args, **kwargs)
   File "/root/.venv/lib/python3.13/site-packages/vllm/compilation/backends.py", line 351, in compile
     compiled_graph, handle = self.compiler.compile(
                              ~~~~~~~~~~~~~~~~~~~~~^
         graph,
         ^^^^^^
     ...<3 lines>...
         maybe_key,
         ^^^^^^^^^^
     )
     ^
   File "/root/.venv/lib/python3.13/site-packages/vllm/compilation/compiler_interface.py", line 372, in compile
     compiled_graph = standalone_compile(graph, example_inputs, **compile_kwargs)
   File "/root/.venv/lib/python3.13/site-packages/torch/_inductor/__init__.py", line 444, in standalone_compile
     return standalone_compile(
         gm, example_inputs, dynamic_shapes=dynamic_shapes, options=options, aot=aot
     )
   File "/root/.venv/lib/python3.13/site-packages/torch/_inductor/standalone_compile.py", line 444, in standalone_compile
     compiled_fn = compile_fx(
         gm, example_inputs, ignore_shape_env=ignore_shape_env, **options
     )
   File "/root/.venv/lib/python3.13/site-packages/torch/_inductor/compile_fx.py", line 2527, in compile_fx
     return compile_fx(
         model_,
     ...<4 lines>...
         ignore_shape_env=ignore_shape_env,
     )
   File "/root/.venv/lib/python3.13/site-packages/torch/_inductor/compile_fx.py", line 2578, in compile_fx
     return _maybe_wrap_and_compile_fx_main(
         model_,
     ...<3 lines>...
         ignore_shape_env,
     )
   File "/root/.venv/lib/python3.13/site-packages/torch/_inductor/compile_fx.py", line 2655, in _maybe_wrap_and_compile_fx_main
     return _compile_fx_main(
         model_,
     ...<3 lines>...
         ignore_shape_env,
     )
   File "/root/.venv/lib/python3.13/site-packages/torch/_inductor/compile_fx.py", line 2864, in _compile_fx_main
     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   File "/root/.venv/lib/python3.13/site-packages/torch/_inductor/compile_fx.py", line 1053, in _compile_fx_inner
     raise InductorError(e, currentframe()).with_traceback(
         e.__traceback__
     ) from None
   File "/root/.venv/lib/python3.13/site-packages/torch/_inductor/compile_fx.py", line 1037, in _compile_fx_inner
     mb_compiled_graph = fx_codegen_and_compile(
         gm, example_inputs, inputs_to_check, **graph_kwargs
     )
   File "/root/.venv/lib/python3.13/site-packages/torch/_inductor/compile_fx.py", line 1798, in fx_codegen_and_compile
     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
            ~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   File "/root/.venv/lib/python3.13/site-packages/torch/_inductor/compile_fx.py", line 1344, in codegen_and_compile
     _recursive_post_grad_passes(gm, is_inference=is_inference)
     ~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   File "/root/.venv/lib/python3.13/site-packages/torch/_inductor/compile_fx.py", line 583, in _recursive_post_grad_passes
     post_grad_passes(gm, is_inference)
     ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
   File "/root/.venv/lib/python3.13/site-packages/torch/_inductor/fx_passes/post_grad.py", line 358, in post_grad_passes
     GraphTransformObserver(gm, "decompose_auto_functionalized").apply_graph_pass(
     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
         decompose_auto_functionalized
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     )
     ^
   File "/root/.venv/lib/python3.13/site-packages/torch/fx/passes/graph_transform_observer.py", line 103, in apply_graph_pass
     return pass_fn(self.gm.graph)
   File "/root/.venv/lib/python3.13/site-packages/torch/_inductor/fx_passes/post_grad.py", line 1392, in decompose_auto_functionalized
     raise AssertionError("auto_functionalized was not removed")
 torch._inductor.exc.InductorError: AssertionError: auto_functionalized was not removed
[rank0]:[W512 19:34:15.728822567 ProcessGroupNCCL.cpp:1575] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
RAW_BUFFERClick to expand / collapse

Your current environment

<details> <summary>The output of <code>python collect_env.py</code></summary>
Your output of `python collect_env.py` here
</details>

🐛 Describe the bug

run deepseek v4 on a bunch of 5090s

Traceback (most recent call last):
   File "/root/.local/share/uv/python/cpython-3.13.13-linux-x86_64-gnu/lib/python3.13/multiprocessing/process.py", line 313, in _bootstrap
     self.run()
     ~~~~~~~~^^
   File "/root/.local/share/uv/python/cpython-3.13.13-linux-x86_64-gnu/lib/python3.13/multiprocessing/process.py", line 108, in run
     self._target(*self._args, **self._kwargs)
     ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   File "/root/.venv/lib/python3.13/site-packages/vllm/v1/engine/core.py", line 1140, in run_engine_core
     raise e
   File "/root/.venv/lib/python3.13/site-packages/vllm/v1/engine/core.py", line 1110, in run_engine_core
     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
   File "/root/.venv/lib/python3.13/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
     return func(*args, **kwargs)
   File "/root/.venv/lib/python3.13/site-packages/vllm/v1/engine/core.py", line 876, in __init__
     super().__init__(
     ~~~~~~~~~~~~~~~~^
         vllm_config,
         ^^^^^^^^^^^^
     ...<3 lines>...
         internal_dp_balancing,
         ^^^^^^^^^^^^^^^^^^^^^^
     )
     ^
   File "/root/.venv/lib/python3.13/site-packages/vllm/v1/engine/core.py", line 128, in __init__
     kv_cache_config = self._initialize_kv_caches(vllm_config)
   File "/root/.venv/lib/python3.13/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
     return func(*args, **kwargs)
   File "/root/.venv/lib/python3.13/site-packages/vllm/v1/engine/core.py", line 250, in _initialize_kv_caches
     available_gpu_memory = self.model_executor.determine_available_memory()
   File "/root/.venv/lib/python3.13/site-packages/vllm/v1/executor/abstract.py", line 147, in determine_available_memory
     return self.collective_rpc("determine_available_memory")
            ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   File "/root/.venv/lib/python3.13/site-packages/vllm/v1/executor/uniproc_executor.py", line 80, in collective_rpc
     result = run_method(self.driver_worker, method, args, kwargs)
   File "/root/.venv/lib/python3.13/site-packages/vllm/v1/serial_utils.py", line 510, in run_method
     return func(*args, **kwargs)
   File "/root/.venv/lib/python3.13/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
     return func(*args, **kwargs)
   File "/root/.venv/lib/python3.13/site-packages/vllm/v1/worker/gpu_worker.py", line 370, in determine_available_memory
     self.model_runner.profile_run()
     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
   File "/root/.venv/lib/python3.13/site-packages/vllm/v1/worker/gpu_model_runner.py", line 5848, in profile_run
     hidden_states, last_hidden_states = self._dummy_run(
                                         ~~~~~~~~~~~~~~~^
         self.max_num_tokens, is_profile=True
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     )
     ^
   File "/root/.venv/lib/python3.13/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
     return func(*args, **kwargs)
   File "/root/.venv/lib/python3.13/site-packages/vllm/v1/worker/gpu_model_runner.py", line 5537, in _dummy_run
     outputs = self.model(
         input_ids=input_ids,
     ...<3 lines>...
         **model_kwargs,
     )
   File "/root/.venv/lib/python3.13/site-packages/vllm/compilation/cuda_graph.py", line 254, in __call__
     return self.runnable(*args, **kwargs)
            ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
   File "/root/.venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
     return self._call_impl(*args, **kwargs)
            ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
   File "/root/.venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
     return forward_call(*args, **kwargs)
   File "/root/.venv/lib/python3.13/site-packages/vllm/model_executor/models/deepseek_v4.py", line 1566, in forward
     hidden_states = self.model(
         input_ids, positions, intermediate_tensors, inputs_embeds
     )
   File "/root/.venv/lib/python3.13/site-packages/vllm/compilation/decorators.py", line 623, in __call__
     self.aot_compiled_fn = self.aot_compile(*args, **kwargs)
                            ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
   File "/root/.venv/lib/python3.13/site-packages/vllm/compilation/wrapper.py", line 183, in aot_compile
     return self._compiled_callable.aot_compile((args, kwargs))
            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
   File "/root/.venv/lib/python3.13/site-packages/torch/_dynamo/eval_frame.py", line 873, in aot_compile
     return aot_compile_fullgraph(
         fn,
     ...<3 lines>...
         dynamic=self._dynamic,
     )
   File "/root/.venv/lib/python3.13/site-packages/torch/_dynamo/aot_compile.py", line 368, in aot_compile_fullgraph
     compiled_fn = backend(
         backend_input.graph_module, backend_input.example_inputs
     )
   File "/root/.venv/lib/python3.13/site-packages/torch/__init__.py", line 2535, in __call__
     return self.compiler_fn(model_, inputs_, **self.kwargs)
            ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   File "/root/.local/share/uv/python/cpython-3.13.13-linux-x86_64-gnu/lib/python3.13/contextlib.py", line 85, in inner
     return func(*args, **kwds)
   File "/root/.venv/lib/python3.13/site-packages/vllm/compilation/backends.py", line 1194, in __call__
     PiecewiseCompileInterpreter(
     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         self.split_gm, submod_names_to_compile, self.vllm_config, self
         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     ).run(*fake_args)
     ~~~~~^^^^^^^^^^^^
   File "/root/.venv/lib/python3.13/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
     return func(*args, **kwargs)
   File "/root/.venv/lib/python3.13/site-packages/vllm/compilation/backends.py", line 721, in run
     return super().run(*args)
            ~~~~~~~~~~~^^^^^^^
   File "/root/.venv/lib/python3.13/site-packages/torch/fx/interpreter.py", line 200, in run
     self.env[node] = self.run_node(node)
                      ~~~~~~~~~~~~~^^^^^^
   File "/root/.venv/lib/python3.13/site-packages/torch/fx/interpreter.py", line 297, in run_node
     return getattr(self, n.op)(n.target, args, kwargs)
            ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^
   File "/root/.venv/lib/python3.13/site-packages/vllm/compilation/backends.py", line 748, in call_module
     piecewise_backend = PiecewiseBackend(
         submod,
     ...<6 lines>...
         submod_name=target,
     )
   File "/root/.venv/lib/python3.13/site-packages/vllm/compilation/piecewise_backend.py", line 190, in __init__
     self.compile_all_ranges()
     ~~~~~~~~~~~~~~~~~~~~~~~^^
   File "/root/.venv/lib/python3.13/site-packages/vllm/compilation/piecewise_backend.py", line 266, in compile_all_ranges
     range_entry.runnable = self.vllm_backend.compiler_manager.compile(
                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
         self.graph,
         ^^^^^^^^^^^
     ...<6 lines>...
         is_encoder=self.vllm_backend.is_encoder,
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     )
     ^
   File "/root/.venv/lib/python3.13/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
     return func(*args, **kwargs)
   File "/root/.venv/lib/python3.13/site-packages/vllm/compilation/backends.py", line 351, in compile
     compiled_graph, handle = self.compiler.compile(
                              ~~~~~~~~~~~~~~~~~~~~~^
         graph,
         ^^^^^^
     ...<3 lines>...
         maybe_key,
         ^^^^^^^^^^
     )
     ^
   File "/root/.venv/lib/python3.13/site-packages/vllm/compilation/compiler_interface.py", line 372, in compile
     compiled_graph = standalone_compile(graph, example_inputs, **compile_kwargs)
   File "/root/.venv/lib/python3.13/site-packages/torch/_inductor/__init__.py", line 444, in standalone_compile
     return standalone_compile(
         gm, example_inputs, dynamic_shapes=dynamic_shapes, options=options, aot=aot
     )
   File "/root/.venv/lib/python3.13/site-packages/torch/_inductor/standalone_compile.py", line 444, in standalone_compile
     compiled_fn = compile_fx(
         gm, example_inputs, ignore_shape_env=ignore_shape_env, **options
     )
   File "/root/.venv/lib/python3.13/site-packages/torch/_inductor/compile_fx.py", line 2527, in compile_fx
     return compile_fx(
         model_,
     ...<4 lines>...
         ignore_shape_env=ignore_shape_env,
     )
   File "/root/.venv/lib/python3.13/site-packages/torch/_inductor/compile_fx.py", line 2578, in compile_fx
     return _maybe_wrap_and_compile_fx_main(
         model_,
     ...<3 lines>...
         ignore_shape_env,
     )
   File "/root/.venv/lib/python3.13/site-packages/torch/_inductor/compile_fx.py", line 2655, in _maybe_wrap_and_compile_fx_main
     return _compile_fx_main(
         model_,
     ...<3 lines>...
         ignore_shape_env,
     )
   File "/root/.venv/lib/python3.13/site-packages/torch/_inductor/compile_fx.py", line 2864, in _compile_fx_main
     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   File "/root/.venv/lib/python3.13/site-packages/torch/_inductor/compile_fx.py", line 1053, in _compile_fx_inner
     raise InductorError(e, currentframe()).with_traceback(
         e.__traceback__
     ) from None
   File "/root/.venv/lib/python3.13/site-packages/torch/_inductor/compile_fx.py", line 1037, in _compile_fx_inner
     mb_compiled_graph = fx_codegen_and_compile(
         gm, example_inputs, inputs_to_check, **graph_kwargs
     )
   File "/root/.venv/lib/python3.13/site-packages/torch/_inductor/compile_fx.py", line 1798, in fx_codegen_and_compile
     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
            ~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   File "/root/.venv/lib/python3.13/site-packages/torch/_inductor/compile_fx.py", line 1344, in codegen_and_compile
     _recursive_post_grad_passes(gm, is_inference=is_inference)
     ~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   File "/root/.venv/lib/python3.13/site-packages/torch/_inductor/compile_fx.py", line 583, in _recursive_post_grad_passes
     post_grad_passes(gm, is_inference)
     ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
   File "/root/.venv/lib/python3.13/site-packages/torch/_inductor/fx_passes/post_grad.py", line 358, in post_grad_passes
     GraphTransformObserver(gm, "decompose_auto_functionalized").apply_graph_pass(
     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
         decompose_auto_functionalized
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     )
     ^
   File "/root/.venv/lib/python3.13/site-packages/torch/fx/passes/graph_transform_observer.py", line 103, in apply_graph_pass
     return pass_fn(self.gm.graph)
   File "/root/.venv/lib/python3.13/site-packages/torch/_inductor/fx_passes/post_grad.py", line 1392, in decompose_auto_functionalized
     raise AssertionError("auto_functionalized was not removed")
 torch._inductor.exc.InductorError: AssertionError: auto_functionalized was not removed
[rank0]:[W512 19:34:15.728822567 ProcessGroupNCCL.cpp:1575] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - 💡(How to fix) Fix [Bug]: AssertionError: auto_functionalized was not removed