vllm - 💡(How to fix) Fix [Bug]: DeepSeek-V4-Flash for L20,RuntimeError: Worker failed with error 'AssertionError: auto_functionalized was not removed', please check the stack trace above for the root cause

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] return func(*args, **kwargs) (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5616, in _dummy_run (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] outputs = self.model( (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 254, in call (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] return self.runnable(*args, **kwargs) (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] return self._call_impl(*args, **kwargs) (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1790, in _call_impl (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] return forward_call(*args, **kwargs) (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v4.py", line 1669, in forward (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] hidden_states = self.model( (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 663, in call (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] self.aot_compiled_fn = self.aot_compile(*args, **kwargs) (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 167, in aot_compile (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] return self._compiled_callable.aot_compile((args, kwargs)) (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/dynamo/eval_frame.py", line 873, in aot_compile (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] return aot_compile_fullgraph( (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/dynamo/aot_compile.py", line 368, in aot_compile_fullgraph (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] compiled_fn = backend( (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/init.py", line 2535, in call (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] return self.compiler_fn(model, inputs, **self.kwargs) (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/lib/python3.12/contextlib.py", line 81, in inner (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] return func(*args, **kwds) (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 1214, in call (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] PiecewiseCompileInterpreter( (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] return func(*args, **kwargs) (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 723, in run (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] return super().run(*args) (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/fx/interpreter.py", line 200, in run (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] self.env[node] = self.run_node(node) (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/fx/interpreter.py", line 297, in run_node (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] return getattr(self, n.op)(n.target, args, kwargs) (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 750, in call_module (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] piecewise_backend = PiecewiseBackend( (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 190, in init (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] self.compile_all_ranges() (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 266, in compile_all_ranges (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] range_entry.runnable = self.vllm_backend.compiler_manager.compile( (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] return func(*args, **kwargs) (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 353, in compile (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] compiled_graph, handle = self.compiler.compile( (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/compiler_interface.py", line 376, in compile (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] compiled_graph = standalone_compile(graph, example_inputs, **compile_kwargs) (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/init.py", line 444, in standalone_compile (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] return standalone_compile( (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/standalone_compile.py", line 444, in standalone_compile (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] compiled_fn = compile_fx( (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2527, in compile_fx (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] return compile_fx( (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2578, in compile_fx (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] return _maybe_wrap_and_compile_fx_main( (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2655, in _maybe_wrap_and_compile_fx_main (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] return _compile_fx_main( (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2864, in _compile_fx_main (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1053, in _compile_fx_inner (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] raise InductorError(e, currentframe()).with_traceback( (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1037, in _compile_fx_inner (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] mb_compiled_graph = fx_codegen_and_compile( (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1798, in fx_codegen_and_compile (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1344, in codegen_and_compile (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] _recursive_post_grad_passes(gm, is_inference=is_inference) (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 583, in _recursive_post_grad_passes (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] post_grad_passes(gm, is_inference) (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/fx_passes/post_grad.py", line 358, in post_grad_passes (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] GraphTransformObserver(gm, "decompose_auto_functionalized").apply_graph_pass( (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/fx/passes/graph_transform_observer.py", line 103, in apply_graph_pass (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] return pass_fn(self.gm.graph) (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/fx_passes/post_grad.py", line 1392, in decompose_auto_functionalized (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] raise AssertionError("auto_functionalized was not removed") (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] torch._inductor.exc.InductorError: AssertionError: auto_functionalized was not removed (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] Traceback (most recent call last): (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 957, in worker_busy_loop (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] output = func(*args, **kwargs) (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] return func(*args, **kwargs) (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 392, in determine_available_memory (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] self.model_runner.profile_run() (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5948, in profile_run (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] hidden_states, last_hidden_states = self._dummy_run( (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] return func(*args, **kwargs) (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5616, in _dummy_run (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] outputs = self.model( (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 254, in call (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] return self.runnable(*args, **kwargs) (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] return self._call_impl(*args, **kwargs) (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1790, in _call_impl (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] return forward_call(*args, **kwargs) (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v4.py", line 1669, in forward (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] hidden_states = self.model( (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 663, in call (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] self.aot_compiled_fn = self.aot_compile(*args, **kwargs) (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 167, in aot_compile (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] return self._compiled_callable.aot_compile((args, kwargs)) (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/dynamo/eval_frame.py", line 873, in aot_compile (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] return aot_compile_fullgraph( (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/dynamo/aot_compile.py", line 368, in aot_compile_fullgraph (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] compiled_fn = backend( (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/init.py", line 2535, in call (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] return self.compiler_fn(model, inputs, **self.kwargs) (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/lib/python3.12/contextlib.py", line 81, in inner (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] return func(*args, **kwds) (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 1214, in call (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] PiecewiseCompileInterpreter( (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] return func(*args, **kwargs) (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 723, in run (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] return super().run(*args) (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/fx/interpreter.py", line 200, in run (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] self.env[node] = self.run_node(node) (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/fx/interpreter.py", line 297, in run_node (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] return getattr(self, n.op)(n.target, args, kwargs) (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 750, in call_module (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] piecewise_backend = PiecewiseBackend( (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 190, in init (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] self.compile_all_ranges() (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 266, in compile_all_ranges (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] range_entry.runnable = self.vllm_backend.compiler_manager.compile( (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] return func(*args, **kwargs) (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 353, in compile (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] compiled_graph, handle = self.compiler.compile( (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/compiler_interface.py", line 376, in compile (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] compiled_graph = standalone_compile(graph, example_inputs, **compile_kwargs) (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/init.py", line 444, in standalone_compile (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] return standalone_compile( (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/standalone_compile.py", line 444, in standalone_compile (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] compiled_fn = compile_fx( (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2527, in compile_fx (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] return compile_fx( (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2578, in compile_fx (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] return _maybe_wrap_and_compile_fx_main( (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2655, in _maybe_wrap_and_compile_fx_main (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] return _compile_fx_main( (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2864, in _compile_fx_main (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1053, in _compile_fx_inner (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] raise InductorError(e, currentframe()).with_traceback( (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1037, in _compile_fx_inner (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] mb_compiled_graph = fx_codegen_and_compile( (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1798, in fx_codegen_and_compile (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1344, in codegen_and_compile (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] _recursive_post_grad_passes(gm, is_inference=is_inference) (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 583, in _recursive_post_grad_passes (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] post_grad_passes(gm, is_inference) (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/fx_passes/post_grad.py", line 358, in post_grad_passes (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] GraphTransformObserver(gm, "decompose_auto_functionalized").apply_graph_pass( (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/fx/passes/graph_transform_observer.py", line 103, in apply_graph_pass (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] return pass_fn(self.gm.graph) (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^ (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/fx_passes/post_grad.py", line 1392, in decompose_auto_functionalized (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] raise AssertionError("auto_functionalized was not removed") (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] torch._inductor.exc.InductorError: AssertionError: auto_functionalized was not removed (Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] (EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] EngineCore failed to start. (EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] Traceback (most recent call last): (EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1114, in run_engine_core (EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs) (EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] return func(*args, **kwargs) (EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 880, in init (EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] super().init( (EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 128, in init (EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] kv_cache_config = self._initialize_kv_caches(vllm_config) (EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] return func(*args, **kwargs) (EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 250, in _initialize_kv_caches (EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] available_gpu_memory = self.model_executor.determine_available_memory() (EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 147, in determine_available_memory (EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] return self.collective_rpc("determine_available_memory") (EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 403, in collective_rpc (EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] return future if non_block else future.result() (EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] ^^^^^^^^^^^^^^^ (EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 90, in result (EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] return super().result() (EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] ^^^^^^^^^^^^^^^^ (EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result (EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] return self.__get_result() (EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] ^^^^^^^^^^^^^^^^^^^ (EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result (EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] raise self._exception (EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 94, in _wait_for_response (EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] response = self.aggregate(self.get_response()) (EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] ^^^^^^^^^^^^^^^^^^^ (EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 390, in get_response (EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] raise RuntimeError( (EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] RuntimeError: Worker failed with error 'AssertionError: auto_functionalized was not removed', please check the stack trace above for the root cause (Worker_TP0_EP0 pid=279) WARNING 05-18 08:21:14 [multiproc_executor.py:884] WorkerProc was terminated (Worker_TP3_EP3 pid=282) WARNING 05-18 08:21:14 [multiproc_executor.py:884] WorkerProc was terminated (Worker_TP2_EP2 pid=281) WARNING 05-18 08:21:14 [multiproc_executor.py:884] WorkerProc was terminated (Worker_TP1_EP1 pid=280) WARNING 05-18 08:21:14 [multiproc_executor.py:884] WorkerProc was terminated (EngineCore pid=209) ERROR 05-18 08:21:17 [multiproc_executor.py:283] Worker proc VllmWorker-2 died unexpectedly, shutting down executor. (EngineCore pid=209) Process EngineCore: (EngineCore pid=209) Traceback (most recent call last): (EngineCore pid=209) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap (EngineCore pid=209) self.run() (EngineCore pid=209) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run (EngineCore pid=209) self._target(*self._args, **self._kwargs) (EngineCore pid=209) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1144, in run_engine_core (EngineCore pid=209) raise e (EngineCore pid=209) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1114, in run_engine_core (EngineCore pid=209) engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs) (EngineCore pid=209) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=209) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=209) return func(*args, **kwargs) (EngineCore pid=209) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=209) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 880, in init (EngineCore pid=209) super().init( (EngineCore pid=209) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 128, in init (EngineCore pid=209) kv_cache_config = self._initialize_kv_caches(vllm_config) (EngineCore pid=209) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=209) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=209) return func(*args, **kwargs) (EngineCore pid=209) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=209) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 250, in _initialize_kv_caches (EngineCore pid=209) available_gpu_memory = self.model_executor.determine_available_memory() (EngineCore pid=209) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=209) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 147, in determine_available_memory (EngineCore pid=209) return self.collective_rpc("determine_available_memory") (EngineCore pid=209) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=209) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 403, in collective_rpc (EngineCore pid=209) return future if non_block else future.result() (EngineCore pid=209) ^^^^^^^^^^^^^^^ (EngineCore pid=209) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 90, in result (EngineCore pid=209) return super().result() (EngineCore pid=209) ^^^^^^^^^^^^^^^^ (EngineCore pid=209) File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result (EngineCore pid=209) return self.__get_result() (EngineCore pid=209) ^^^^^^^^^^^^^^^^^^^ (EngineCore pid=209) File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result (EngineCore pid=209) raise self._exception (EngineCore pid=209) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 94, in _wait_for_response (EngineCore pid=209) response = self.aggregate(self.get_response()) (EngineCore pid=209) ^^^^^^^^^^^^^^^^^^^ (EngineCore pid=209) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 390, in get_response (EngineCore pid=209) raise RuntimeError( (EngineCore pid=209) RuntimeError: Worker failed with error 'AssertionError: auto_functionalized was not removed', please check the stack trace above for the root cause (APIServer pid=1) Traceback (most recent call last): (APIServer pid=1) File "/usr/local/bin/vllm", line 10, in <module> (APIServer pid=1) sys.exit(main()) (APIServer pid=1) ^^^^^^ (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 92, in main (APIServer pid=1) args.dispatch_function(args) (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 122, in cmd (APIServer pid=1) uvloop.run(run_server(args)) (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 96, in run (APIServer pid=1) return __asyncio.run( (APIServer pid=1) ^^^^^^^^^^^^^^ (APIServer pid=1) File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run (APIServer pid=1) return runner.run(main) (APIServer pid=1) ^^^^^^^^^^^^^^^^ (APIServer pid=1) File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run (APIServer pid=1) return self._loop.run_until_complete(task) (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 48, in wrapper (APIServer pid=1) return await main (APIServer pid=1) ^^^^^^^^^^ (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 693, in run_server (APIServer pid=1) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs) (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 707, in run_server_worker (APIServer pid=1) async with build_async_engine_client( (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1) File "/usr/lib/python3.12/contextlib.py", line 210, in aenter (APIServer pid=1) return await anext(self.gen) (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 100, in build_async_engine_client (APIServer pid=1) async with build_async_engine_client_from_engine_args( (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1) File "/usr/lib/python3.12/contextlib.py", line 210, in aenter (APIServer pid=1) return await anext(self.gen) (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 136, in build_async_engine_client_from_engine_args (APIServer pid=1) async_llm = AsyncLLM.from_vllm_config( (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 217, in from_vllm_config (APIServer pid=1) return cls( (APIServer pid=1) ^^^^ (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 146, in init (APIServer pid=1) self.engine_core = EngineCoreClient.make_async_mp_client( (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (APIServer pid=1) return func(*args, **kwargs) (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 130, in make_async_mp_client (APIServer pid=1) return AsyncMPClient(*client_args) (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (APIServer pid=1) return func(*args, **kwargs) (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 900, in init (APIServer pid=1) super().init( (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 535, in init (APIServer pid=1) with launch_core_engines( (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1) File "/usr/lib/python3.12/contextlib.py", line 144, in exit (APIServer pid=1) next(self.gen) (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 1128, in launch_core_engines (APIServer pid=1) wait_for_engine_startup( (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 1187, in wait_for_engine_startup (APIServer pid=1) raise RuntimeError( (APIServer pid=1) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {} /usr/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 4 leaked shared_memory objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '

Root Cause

(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5616, in _dummy_run
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     outputs = self.model(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]               ^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 254, in __call__
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return self.runnable(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return self._call_impl(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1790, in _call_impl
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return forward_call(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v4.py", line 1669, in forward
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     hidden_states = self.model(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                     ^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 663, in __call__
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     self.aot_compiled_fn = self.aot_compile(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 167, in aot_compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return self._compiled_callable.aot_compile((args, kwargs))
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 873, in aot_compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return aot_compile_fullgraph(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/aot_compile.py", line 368, in aot_compile_fullgraph
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     compiled_fn = backend(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                   ^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/__init__.py", line 2535, in __call__
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return self.compiler_fn(model_, inputs_, **self.kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/lib/python3.12/contextlib.py", line 81, in inner
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return func(*args, **kwds)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 1214, in __call__
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     PiecewiseCompileInterpreter(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 723, in run
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return super().run(*args)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/interpreter.py", line 200, in run
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     self.env[node] = self.run_node(node)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                      ^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/interpreter.py", line 297, in run_node
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return getattr(self, n.op)(n.target, args, kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 750, in call_module
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     piecewise_backend = PiecewiseBackend(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                         ^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 190, in __init__
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     self.compile_all_ranges()
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 266, in compile_all_ranges
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     range_entry.runnable = self.vllm_backend.compiler_manager.compile(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 353, in compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     compiled_graph, handle = self.compiler.compile(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                              ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/compiler_interface.py", line 376, in compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     compiled_graph = standalone_compile(graph, example_inputs, **compile_kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/__init__.py", line 444, in standalone_compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return standalone_compile(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/standalone_compile.py", line 444, in standalone_compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     compiled_fn = compile_fx(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                   ^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2527, in compile_fx
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return compile_fx(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2578, in compile_fx
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return _maybe_wrap_and_compile_fx_main(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2655, in _maybe_wrap_and_compile_fx_main
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return _compile_fx_main(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2864, in _compile_fx_main
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1053, in _compile_fx_inner
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     raise InductorError(e, currentframe()).with_traceback(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1037, in _compile_fx_inner
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     mb_compiled_graph = fx_codegen_and_compile(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                         ^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1798, in fx_codegen_and_compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1344, in codegen_and_compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     _recursive_post_grad_passes(gm, is_inference=is_inference)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 583, in _recursive_post_grad_passes
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     post_grad_passes(gm, is_inference)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/fx_passes/post_grad.py", line 358, in post_grad_passes
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     GraphTransformObserver(gm, "decompose_auto_functionalized").apply_graph_pass(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/passes/graph_transform_observer.py", line 103, in apply_graph_pass
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return pass_fn(self.gm.graph)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/fx_passes/post_grad.py", line 1392, in decompose_auto_functionalized
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     raise AssertionError("auto_functionalized was not removed")
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] torch._inductor.exc.InductorError: AssertionError: auto_functionalized was not removed
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] Traceback (most recent call last):
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 957, in worker_busy_loop
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     output = func(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 392, in determine_available_memory
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     self.model_runner.profile_run()
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5948, in profile_run
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     hidden_states, last_hidden_states = self._dummy_run(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                                         ^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5616, in _dummy_run
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     outputs = self.model(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]               ^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 254, in __call__
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return self.runnable(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return self._call_impl(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1790, in _call_impl
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return forward_call(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v4.py", line 1669, in forward
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     hidden_states = self.model(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                     ^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 663, in __call__
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     self.aot_compiled_fn = self.aot_compile(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 167, in aot_compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return self._compiled_callable.aot_compile((args, kwargs))
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 873, in aot_compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return aot_compile_fullgraph(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/aot_compile.py", line 368, in aot_compile_fullgraph
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     compiled_fn = backend(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                   ^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/__init__.py", line 2535, in __call__
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return self.compiler_fn(model_, inputs_, **self.kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/lib/python3.12/contextlib.py", line 81, in inner
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return func(*args, **kwds)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 1214, in __call__
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     PiecewiseCompileInterpreter(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 723, in run
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return super().run(*args)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/interpreter.py", line 200, in run
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     self.env[node] = self.run_node(node)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                      ^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/interpreter.py", line 297, in run_node
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return getattr(self, n.op)(n.target, args, kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 750, in call_module
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     piecewise_backend = PiecewiseBackend(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                         ^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 190, in __init__
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     self.compile_all_ranges()
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 266, in compile_all_ranges
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     range_entry.runnable = self.vllm_backend.compiler_manager.compile(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 353, in compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     compiled_graph, handle = self.compiler.compile(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                              ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/compiler_interface.py", line 376, in compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     compiled_graph = standalone_compile(graph, example_inputs, **compile_kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/__init__.py", line 444, in standalone_compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return standalone_compile(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/standalone_compile.py", line 444, in standalone_compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     compiled_fn = compile_fx(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                   ^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2527, in compile_fx
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return compile_fx(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2578, in compile_fx
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return _maybe_wrap_and_compile_fx_main(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2655, in _maybe_wrap_and_compile_fx_main
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return _compile_fx_main(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2864, in _compile_fx_main
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1053, in _compile_fx_inner
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     raise InductorError(e, currentframe()).with_traceback(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1037, in _compile_fx_inner
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     mb_compiled_graph = fx_codegen_and_compile(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                         ^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1798, in fx_codegen_and_compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1344, in codegen_and_compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     _recursive_post_grad_passes(gm, is_inference=is_inference)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 583, in _recursive_post_grad_passes
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     post_grad_passes(gm, is_inference)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/fx_passes/post_grad.py", line 358, in post_grad_passes
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     GraphTransformObserver(gm, "decompose_auto_functionalized").apply_graph_pass(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/passes/graph_transform_observer.py", line 103, in apply_graph_pass
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return pass_fn(self.gm.graph)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/fx_passes/post_grad.py", line 1392, in decompose_auto_functionalized
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     raise AssertionError("auto_functionalized was not removed")
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] torch._inductor.exc.InductorError: AssertionError: auto_functionalized was not removed
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] 
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] EngineCore failed to start.
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] Traceback (most recent call last):
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1114, in run_engine_core
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     return func(*args, **kwargs)
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 880, in __init__
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     super().__init__(
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 128, in __init__
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     kv_cache_config = self._initialize_kv_caches(vllm_config)
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     return func(*args, **kwargs)
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 250, in _initialize_kv_caches
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     available_gpu_memory = self.model_executor.determine_available_memory()
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 147, in determine_available_memory
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     return self.collective_rpc("determine_available_memory")
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 403, in collective_rpc
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     return future if non_block else future.result()
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]                                     ^^^^^^^^^^^^^^^
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 90, in result
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     return super().result()
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]            ^^^^^^^^^^^^^^^^
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     return self.__get_result()
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]            ^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     raise self._exception
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 94, in _wait_for_response
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     response = self.aggregate(self.get_response())
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]                               ^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 390, in get_response
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     raise RuntimeError(
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] RuntimeError: Worker failed with error 'AssertionError: auto_functionalized was not removed', please check the stack trace above for the root cause
(Worker_TP0_EP0 pid=279) WARNING 05-18 08:21:14 [multiproc_executor.py:884] WorkerProc was terminated
(Worker_TP3_EP3 pid=282) WARNING 05-18 08:21:14 [multiproc_executor.py:884] WorkerProc was terminated
(Worker_TP2_EP2 pid=281) WARNING 05-18 08:21:14 [multiproc_executor.py:884] WorkerProc was terminated
(Worker_TP1_EP1 pid=280) WARNING 05-18 08:21:14 [multiproc_executor.py:884] WorkerProc was terminated
(EngineCore pid=209) ERROR 05-18 08:21:17 [multiproc_executor.py:283] Worker proc VllmWorker-2 died unexpectedly, shutting down executor.
(EngineCore pid=209) Process EngineCore:
(EngineCore pid=209) Traceback (most recent call last):
(EngineCore pid=209)   File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore pid=209)     self.run()
(EngineCore pid=209)   File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore pid=209)     self._target(*self._args, **self._kwargs)
(EngineCore pid=209)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1144, in run_engine_core
(EngineCore pid=209)     raise e
(EngineCore pid=209)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1114, in run_engine_core
(EngineCore pid=209)     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore pid=209)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209)   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=209)     return func(*args, **kwargs)
(EngineCore pid=209)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 880, in __init__
(EngineCore pid=209)     super().__init__(
(EngineCore pid=209)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 128, in __init__
(EngineCore pid=209)     kv_cache_config = self._initialize_kv_caches(vllm_config)
(EngineCore pid=209)                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209)   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=209)     return func(*args, **kwargs)
(EngineCore pid=209)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 250, in _initialize_kv_caches
(EngineCore pid=209)     available_gpu_memory = self.model_executor.determine_available_memory()
(EngineCore pid=209)                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 147, in determine_available_memory
(EngineCore pid=209)     return self.collective_rpc("determine_available_memory")
(EngineCore pid=209)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 403, in collective_rpc
(EngineCore pid=209)     return future if non_block else future.result()
(EngineCore pid=209)                                     ^^^^^^^^^^^^^^^
(EngineCore pid=209)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 90, in result
(EngineCore pid=209)     return super().result()
(EngineCore pid=209)            ^^^^^^^^^^^^^^^^
(EngineCore pid=209)   File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
(EngineCore pid=209)     return self.__get_result()
(EngineCore pid=209)            ^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209)   File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
(EngineCore pid=209)     raise self._exception
(EngineCore pid=209)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 94, in _wait_for_response
(EngineCore pid=209)     response = self.aggregate(self.get_response())
(EngineCore pid=209)                               ^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 390, in get_response
(EngineCore pid=209)     raise RuntimeError(
(EngineCore pid=209) RuntimeError: Worker failed with error 'AssertionError: auto_functionalized was not removed', please check the stack trace above for the root cause
(APIServer pid=1) Traceback (most recent call last):
(APIServer pid=1)   File "/usr/local/bin/vllm", line 10, in <module>
(APIServer pid=1)     sys.exit(main())
(APIServer pid=1)              ^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 92, in main
(APIServer pid=1)     args.dispatch_function(args)
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 122, in cmd
(APIServer pid=1)     uvloop.run(run_server(args))
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 96, in run
(APIServer pid=1)     return __asyncio.run(
(APIServer pid=1)            ^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=1)     return runner.run(main)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=1)     return self._loop.run_until_complete(task)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=1)     return await main
(APIServer pid=1)            ^^^^^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 693, in run_server
(APIServer pid=1)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 707, in run_server_worker
(APIServer pid=1)     async with build_async_engine_client(
(APIServer pid=1)                ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=1)     return await anext(self.gen)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 100, in build_async_engine_client
(APIServer pid=1)     async with build_async_engine_client_from_engine_args(
(APIServer pid=1)                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=1)     return await anext(self.gen)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 136, in build_async_engine_client_from_engine_args
(APIServer pid=1)     async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=1)                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 217, in from_vllm_config
(APIServer pid=1)     return cls(
(APIServer pid=1)            ^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 146, in __init__
(APIServer pid=1)     self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=1)                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(APIServer pid=1)     return func(*args, **kwargs)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 130, in make_async_mp_client
(APIServer pid=1)     return AsyncMPClient(*client_args)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(APIServer pid=1)     return func(*args, **kwargs)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 900, in __init__
(APIServer pid=1)     super().__init__(
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 535, in __init__
(APIServer pid=1)     with launch_core_engines(
(APIServer pid=1)          ^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/lib/python3.12/contextlib.py", line 144, in __exit__
(APIServer pid=1)     next(self.gen)
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 1128, in launch_core_engines
(APIServer pid=1)     wait_for_engine_startup(
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 1187, in wait_for_engine_startup
(APIServer pid=1)     raise RuntimeError(
(APIServer pid=1) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
/usr/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 4 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

Fix Action

Fix / Workaround

(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5616, in _dummy_run
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     outputs = self.model(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]               ^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 254, in __call__
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return self.runnable(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return self._call_impl(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1790, in _call_impl
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return forward_call(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v4.py", line 1669, in forward
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     hidden_states = self.model(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                     ^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 663, in __call__
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     self.aot_compiled_fn = self.aot_compile(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 167, in aot_compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return self._compiled_callable.aot_compile((args, kwargs))
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 873, in aot_compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return aot_compile_fullgraph(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/aot_compile.py", line 368, in aot_compile_fullgraph
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     compiled_fn = backend(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                   ^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/__init__.py", line 2535, in __call__
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return self.compiler_fn(model_, inputs_, **self.kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/lib/python3.12/contextlib.py", line 81, in inner
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return func(*args, **kwds)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 1214, in __call__
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     PiecewiseCompileInterpreter(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 723, in run
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return super().run(*args)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/interpreter.py", line 200, in run
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     self.env[node] = self.run_node(node)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                      ^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/interpreter.py", line 297, in run_node
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return getattr(self, n.op)(n.target, args, kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 750, in call_module
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     piecewise_backend = PiecewiseBackend(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                         ^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 190, in __init__
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     self.compile_all_ranges()
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 266, in compile_all_ranges
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     range_entry.runnable = self.vllm_backend.compiler_manager.compile(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 353, in compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     compiled_graph, handle = self.compiler.compile(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                              ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/compiler_interface.py", line 376, in compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     compiled_graph = standalone_compile(graph, example_inputs, **compile_kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/__init__.py", line 444, in standalone_compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return standalone_compile(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/standalone_compile.py", line 444, in standalone_compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     compiled_fn = compile_fx(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                   ^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2527, in compile_fx
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return compile_fx(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2578, in compile_fx
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return _maybe_wrap_and_compile_fx_main(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2655, in _maybe_wrap_and_compile_fx_main
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return _compile_fx_main(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2864, in _compile_fx_main
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1053, in _compile_fx_inner
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     raise InductorError(e, currentframe()).with_traceback(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1037, in _compile_fx_inner
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     mb_compiled_graph = fx_codegen_and_compile(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                         ^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1798, in fx_codegen_and_compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1344, in codegen_and_compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     _recursive_post_grad_passes(gm, is_inference=is_inference)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 583, in _recursive_post_grad_passes
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     post_grad_passes(gm, is_inference)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/fx_passes/post_grad.py", line 358, in post_grad_passes
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     GraphTransformObserver(gm, "decompose_auto_functionalized").apply_graph_pass(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/passes/graph_transform_observer.py", line 103, in apply_graph_pass
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return pass_fn(self.gm.graph)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/fx_passes/post_grad.py", line 1392, in decompose_auto_functionalized
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     raise AssertionError("auto_functionalized was not removed")
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] torch._inductor.exc.InductorError: AssertionError: auto_functionalized was not removed
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] Traceback (most recent call last):
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 957, in worker_busy_loop
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     output = func(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 392, in determine_available_memory
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     self.model_runner.profile_run()
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5948, in profile_run
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     hidden_states, last_hidden_states = self._dummy_run(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                                         ^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5616, in _dummy_run
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     outputs = self.model(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]               ^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 254, in __call__
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return self.runnable(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return self._call_impl(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1790, in _call_impl
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return forward_call(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v4.py", line 1669, in forward
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     hidden_states = self.model(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                     ^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 663, in __call__
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     self.aot_compiled_fn = self.aot_compile(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 167, in aot_compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return self._compiled_callable.aot_compile((args, kwargs))
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 873, in aot_compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return aot_compile_fullgraph(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/aot_compile.py", line 368, in aot_compile_fullgraph
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     compiled_fn = backend(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                   ^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/__init__.py", line 2535, in __call__
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return self.compiler_fn(model_, inputs_, **self.kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/lib/python3.12/contextlib.py", line 81, in inner
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return func(*args, **kwds)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 1214, in __call__
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     PiecewiseCompileInterpreter(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 723, in run
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return super().run(*args)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/interpreter.py", line 200, in run
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     self.env[node] = self.run_node(node)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                      ^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/interpreter.py", line 297, in run_node
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return getattr(self, n.op)(n.target, args, kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 750, in call_module
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     piecewise_backend = PiecewiseBackend(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                         ^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 190, in __init__
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     self.compile_all_ranges()
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 266, in compile_all_ranges
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     range_entry.runnable = self.vllm_backend.compiler_manager.compile(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 353, in compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     compiled_graph, handle = self.compiler.compile(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                              ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/compiler_interface.py", line 376, in compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     compiled_graph = standalone_compile(graph, example_inputs, **compile_kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/__init__.py", line 444, in standalone_compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return standalone_compile(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/standalone_compile.py", line 444, in standalone_compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     compiled_fn = compile_fx(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                   ^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2527, in compile_fx
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return compile_fx(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2578, in compile_fx
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return _maybe_wrap_and_compile_fx_main(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2655, in _maybe_wrap_and_compile_fx_main
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return _compile_fx_main(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2864, in _compile_fx_main
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1053, in _compile_fx_inner
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     raise InductorError(e, currentframe()).with_traceback(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1037, in _compile_fx_inner
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     mb_compiled_graph = fx_codegen_and_compile(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                         ^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1798, in fx_codegen_and_compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1344, in codegen_and_compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     _recursive_post_grad_passes(gm, is_inference=is_inference)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 583, in _recursive_post_grad_passes
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     post_grad_passes(gm, is_inference)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/fx_passes/post_grad.py", line 358, in post_grad_passes
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     GraphTransformObserver(gm, "decompose_auto_functionalized").apply_graph_pass(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/passes/graph_transform_observer.py", line 103, in apply_graph_pass
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return pass_fn(self.gm.graph)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/fx_passes/post_grad.py", line 1392, in decompose_auto_functionalized
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     raise AssertionError("auto_functionalized was not removed")
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] torch._inductor.exc.InductorError: AssertionError: auto_functionalized was not removed
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] 
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] EngineCore failed to start.
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] Traceback (most recent call last):
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1114, in run_engine_core
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     return func(*args, **kwargs)
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 880, in __init__
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     super().__init__(
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 128, in __init__
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     kv_cache_config = self._initialize_kv_caches(vllm_config)
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     return func(*args, **kwargs)
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 250, in _initialize_kv_caches
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     available_gpu_memory = self.model_executor.determine_available_memory()
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 147, in determine_available_memory
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     return self.collective_rpc("determine_available_memory")
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 403, in collective_rpc
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     return future if non_block else future.result()
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]                                     ^^^^^^^^^^^^^^^
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 90, in result
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     return super().result()
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]            ^^^^^^^^^^^^^^^^
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     return self.__get_result()
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]            ^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     raise self._exception
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 94, in _wait_for_response
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     response = self.aggregate(self.get_response())
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]                               ^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 390, in get_response
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     raise RuntimeError(
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] RuntimeError: Worker failed with error 'AssertionError: auto_functionalized was not removed', please check the stack trace above for the root cause
(Worker_TP0_EP0 pid=279) WARNING 05-18 08:21:14 [multiproc_executor.py:884] WorkerProc was terminated
(Worker_TP3_EP3 pid=282) WARNING 05-18 08:21:14 [multiproc_executor.py:884] WorkerProc was terminated
(Worker_TP2_EP2 pid=281) WARNING 05-18 08:21:14 [multiproc_executor.py:884] WorkerProc was terminated
(Worker_TP1_EP1 pid=280) WARNING 05-18 08:21:14 [multiproc_executor.py:884] WorkerProc was terminated
(EngineCore pid=209) ERROR 05-18 08:21:17 [multiproc_executor.py:283] Worker proc VllmWorker-2 died unexpectedly, shutting down executor.
(EngineCore pid=209) Process EngineCore:
(EngineCore pid=209) Traceback (most recent call last):
(EngineCore pid=209)   File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore pid=209)     self.run()
(EngineCore pid=209)   File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore pid=209)     self._target(*self._args, **self._kwargs)
(EngineCore pid=209)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1144, in run_engine_core
(EngineCore pid=209)     raise e
(EngineCore pid=209)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1114, in run_engine_core
(EngineCore pid=209)     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore pid=209)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209)   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=209)     return func(*args, **kwargs)
(EngineCore pid=209)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 880, in __init__
(EngineCore pid=209)     super().__init__(
(EngineCore pid=209)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 128, in __init__
(EngineCore pid=209)     kv_cache_config = self._initialize_kv_caches(vllm_config)
(EngineCore pid=209)                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209)   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=209)     return func(*args, **kwargs)
(EngineCore pid=209)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 250, in _initialize_kv_caches
(EngineCore pid=209)     available_gpu_memory = self.model_executor.determine_available_memory()
(EngineCore pid=209)                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 147, in determine_available_memory
(EngineCore pid=209)     return self.collective_rpc("determine_available_memory")
(EngineCore pid=209)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 403, in collective_rpc
(EngineCore pid=209)     return future if non_block else future.result()
(EngineCore pid=209)                                     ^^^^^^^^^^^^^^^
(EngineCore pid=209)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 90, in result
(EngineCore pid=209)     return super().result()
(EngineCore pid=209)            ^^^^^^^^^^^^^^^^
(EngineCore pid=209)   File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
(EngineCore pid=209)     return self.__get_result()
(EngineCore pid=209)            ^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209)   File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
(EngineCore pid=209)     raise self._exception
(EngineCore pid=209)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 94, in _wait_for_response
(EngineCore pid=209)     response = self.aggregate(self.get_response())
(EngineCore pid=209)                               ^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 390, in get_response
(EngineCore pid=209)     raise RuntimeError(
(EngineCore pid=209) RuntimeError: Worker failed with error 'AssertionError: auto_functionalized was not removed', please check the stack trace above for the root cause
(APIServer pid=1) Traceback (most recent call last):
(APIServer pid=1)   File "/usr/local/bin/vllm", line 10, in <module>
(APIServer pid=1)     sys.exit(main())
(APIServer pid=1)              ^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 92, in main
(APIServer pid=1)     args.dispatch_function(args)
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 122, in cmd
(APIServer pid=1)     uvloop.run(run_server(args))
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 96, in run
(APIServer pid=1)     return __asyncio.run(
(APIServer pid=1)            ^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=1)     return runner.run(main)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=1)     return self._loop.run_until_complete(task)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=1)     return await main
(APIServer pid=1)            ^^^^^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 693, in run_server
(APIServer pid=1)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 707, in run_server_worker
(APIServer pid=1)     async with build_async_engine_client(
(APIServer pid=1)                ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=1)     return await anext(self.gen)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 100, in build_async_engine_client
(APIServer pid=1)     async with build_async_engine_client_from_engine_args(
(APIServer pid=1)                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=1)     return await anext(self.gen)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 136, in build_async_engine_client_from_engine_args
(APIServer pid=1)     async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=1)                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 217, in from_vllm_config
(APIServer pid=1)     return cls(
(APIServer pid=1)            ^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 146, in __init__
(APIServer pid=1)     self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=1)                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(APIServer pid=1)     return func(*args, **kwargs)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 130, in make_async_mp_client
(APIServer pid=1)     return AsyncMPClient(*client_args)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(APIServer pid=1)     return func(*args, **kwargs)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 900, in __init__
(APIServer pid=1)     super().__init__(
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 535, in __init__
(APIServer pid=1)     with launch_core_engines(
(APIServer pid=1)          ^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/lib/python3.12/contextlib.py", line 144, in __exit__
(APIServer pid=1)     next(self.gen)
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 1128, in launch_core_engines
(APIServer pid=1)     wait_for_engine_startup(
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 1187, in wait_for_engine_startup
(APIServer pid=1)     raise RuntimeError(
(APIServer pid=1) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
/usr/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 4 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

Code Example

(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5616, in _dummy_run
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     outputs = self.model(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]               ^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 254, in __call__
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return self.runnable(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return self._call_impl(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1790, in _call_impl
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return forward_call(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v4.py", line 1669, in forward
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     hidden_states = self.model(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                     ^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 663, in __call__
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     self.aot_compiled_fn = self.aot_compile(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 167, in aot_compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return self._compiled_callable.aot_compile((args, kwargs))
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 873, in aot_compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return aot_compile_fullgraph(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/aot_compile.py", line 368, in aot_compile_fullgraph
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     compiled_fn = backend(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                   ^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/__init__.py", line 2535, in __call__
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return self.compiler_fn(model_, inputs_, **self.kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/lib/python3.12/contextlib.py", line 81, in inner
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return func(*args, **kwds)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 1214, in __call__
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     PiecewiseCompileInterpreter(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 723, in run
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return super().run(*args)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/interpreter.py", line 200, in run
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     self.env[node] = self.run_node(node)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                      ^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/interpreter.py", line 297, in run_node
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return getattr(self, n.op)(n.target, args, kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 750, in call_module
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     piecewise_backend = PiecewiseBackend(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                         ^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 190, in __init__
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     self.compile_all_ranges()
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 266, in compile_all_ranges
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     range_entry.runnable = self.vllm_backend.compiler_manager.compile(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 353, in compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     compiled_graph, handle = self.compiler.compile(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                              ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/compiler_interface.py", line 376, in compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     compiled_graph = standalone_compile(graph, example_inputs, **compile_kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/__init__.py", line 444, in standalone_compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return standalone_compile(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/standalone_compile.py", line 444, in standalone_compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     compiled_fn = compile_fx(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                   ^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2527, in compile_fx
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return compile_fx(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2578, in compile_fx
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return _maybe_wrap_and_compile_fx_main(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2655, in _maybe_wrap_and_compile_fx_main
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return _compile_fx_main(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2864, in _compile_fx_main
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1053, in _compile_fx_inner
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     raise InductorError(e, currentframe()).with_traceback(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1037, in _compile_fx_inner
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     mb_compiled_graph = fx_codegen_and_compile(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                         ^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1798, in fx_codegen_and_compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1344, in codegen_and_compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     _recursive_post_grad_passes(gm, is_inference=is_inference)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 583, in _recursive_post_grad_passes
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     post_grad_passes(gm, is_inference)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/fx_passes/post_grad.py", line 358, in post_grad_passes
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     GraphTransformObserver(gm, "decompose_auto_functionalized").apply_graph_pass(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/passes/graph_transform_observer.py", line 103, in apply_graph_pass
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return pass_fn(self.gm.graph)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/fx_passes/post_grad.py", line 1392, in decompose_auto_functionalized
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     raise AssertionError("auto_functionalized was not removed")
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] torch._inductor.exc.InductorError: AssertionError: auto_functionalized was not removed
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] Traceback (most recent call last):
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 957, in worker_busy_loop
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     output = func(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 392, in determine_available_memory
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     self.model_runner.profile_run()
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5948, in profile_run
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     hidden_states, last_hidden_states = self._dummy_run(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                                         ^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5616, in _dummy_run
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     outputs = self.model(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]               ^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 254, in __call__
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return self.runnable(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return self._call_impl(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1790, in _call_impl
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return forward_call(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v4.py", line 1669, in forward
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     hidden_states = self.model(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                     ^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 663, in __call__
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     self.aot_compiled_fn = self.aot_compile(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 167, in aot_compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return self._compiled_callable.aot_compile((args, kwargs))
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 873, in aot_compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return aot_compile_fullgraph(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/aot_compile.py", line 368, in aot_compile_fullgraph
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     compiled_fn = backend(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                   ^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/__init__.py", line 2535, in __call__
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return self.compiler_fn(model_, inputs_, **self.kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/lib/python3.12/contextlib.py", line 81, in inner
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return func(*args, **kwds)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 1214, in __call__
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     PiecewiseCompileInterpreter(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 723, in run
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return super().run(*args)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/interpreter.py", line 200, in run
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     self.env[node] = self.run_node(node)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                      ^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/interpreter.py", line 297, in run_node
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return getattr(self, n.op)(n.target, args, kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 750, in call_module
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     piecewise_backend = PiecewiseBackend(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                         ^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 190, in __init__
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     self.compile_all_ranges()
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 266, in compile_all_ranges
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     range_entry.runnable = self.vllm_backend.compiler_manager.compile(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 353, in compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     compiled_graph, handle = self.compiler.compile(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                              ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/compiler_interface.py", line 376, in compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     compiled_graph = standalone_compile(graph, example_inputs, **compile_kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/__init__.py", line 444, in standalone_compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return standalone_compile(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/standalone_compile.py", line 444, in standalone_compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     compiled_fn = compile_fx(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                   ^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2527, in compile_fx
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return compile_fx(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2578, in compile_fx
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return _maybe_wrap_and_compile_fx_main(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2655, in _maybe_wrap_and_compile_fx_main
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return _compile_fx_main(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2864, in _compile_fx_main
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1053, in _compile_fx_inner
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     raise InductorError(e, currentframe()).with_traceback(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1037, in _compile_fx_inner
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     mb_compiled_graph = fx_codegen_and_compile(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                         ^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1798, in fx_codegen_and_compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1344, in codegen_and_compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     _recursive_post_grad_passes(gm, is_inference=is_inference)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 583, in _recursive_post_grad_passes
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     post_grad_passes(gm, is_inference)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/fx_passes/post_grad.py", line 358, in post_grad_passes
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     GraphTransformObserver(gm, "decompose_auto_functionalized").apply_graph_pass(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/passes/graph_transform_observer.py", line 103, in apply_graph_pass
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return pass_fn(self.gm.graph)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/fx_passes/post_grad.py", line 1392, in decompose_auto_functionalized
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     raise AssertionError("auto_functionalized was not removed")
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] torch._inductor.exc.InductorError: AssertionError: auto_functionalized was not removed
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] 
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] EngineCore failed to start.
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] Traceback (most recent call last):
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1114, in run_engine_core
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     return func(*args, **kwargs)
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 880, in __init__
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     super().__init__(
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 128, in __init__
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     kv_cache_config = self._initialize_kv_caches(vllm_config)
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     return func(*args, **kwargs)
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 250, in _initialize_kv_caches
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     available_gpu_memory = self.model_executor.determine_available_memory()
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 147, in determine_available_memory
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     return self.collective_rpc("determine_available_memory")
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 403, in collective_rpc
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     return future if non_block else future.result()
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]                                     ^^^^^^^^^^^^^^^
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 90, in result
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     return super().result()
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]            ^^^^^^^^^^^^^^^^
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     return self.__get_result()
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]            ^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     raise self._exception
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 94, in _wait_for_response
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     response = self.aggregate(self.get_response())
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]                               ^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 390, in get_response
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     raise RuntimeError(
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] RuntimeError: Worker failed with error 'AssertionError: auto_functionalized was not removed', please check the stack trace above for the root cause
(Worker_TP0_EP0 pid=279) WARNING 05-18 08:21:14 [multiproc_executor.py:884] WorkerProc was terminated
(Worker_TP3_EP3 pid=282) WARNING 05-18 08:21:14 [multiproc_executor.py:884] WorkerProc was terminated
(Worker_TP2_EP2 pid=281) WARNING 05-18 08:21:14 [multiproc_executor.py:884] WorkerProc was terminated
(Worker_TP1_EP1 pid=280) WARNING 05-18 08:21:14 [multiproc_executor.py:884] WorkerProc was terminated
(EngineCore pid=209) ERROR 05-18 08:21:17 [multiproc_executor.py:283] Worker proc VllmWorker-2 died unexpectedly, shutting down executor.
(EngineCore pid=209) Process EngineCore:
(EngineCore pid=209) Traceback (most recent call last):
(EngineCore pid=209)   File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore pid=209)     self.run()
(EngineCore pid=209)   File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore pid=209)     self._target(*self._args, **self._kwargs)
(EngineCore pid=209)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1144, in run_engine_core
(EngineCore pid=209)     raise e
(EngineCore pid=209)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1114, in run_engine_core
(EngineCore pid=209)     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore pid=209)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209)   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=209)     return func(*args, **kwargs)
(EngineCore pid=209)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 880, in __init__
(EngineCore pid=209)     super().__init__(
(EngineCore pid=209)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 128, in __init__
(EngineCore pid=209)     kv_cache_config = self._initialize_kv_caches(vllm_config)
(EngineCore pid=209)                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209)   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=209)     return func(*args, **kwargs)
(EngineCore pid=209)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 250, in _initialize_kv_caches
(EngineCore pid=209)     available_gpu_memory = self.model_executor.determine_available_memory()
(EngineCore pid=209)                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 147, in determine_available_memory
(EngineCore pid=209)     return self.collective_rpc("determine_available_memory")
(EngineCore pid=209)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 403, in collective_rpc
(EngineCore pid=209)     return future if non_block else future.result()
(EngineCore pid=209)                                     ^^^^^^^^^^^^^^^
(EngineCore pid=209)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 90, in result
(EngineCore pid=209)     return super().result()
(EngineCore pid=209)            ^^^^^^^^^^^^^^^^
(EngineCore pid=209)   File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
(EngineCore pid=209)     return self.__get_result()
(EngineCore pid=209)            ^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209)   File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
(EngineCore pid=209)     raise self._exception
(EngineCore pid=209)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 94, in _wait_for_response
(EngineCore pid=209)     response = self.aggregate(self.get_response())
(EngineCore pid=209)                               ^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 390, in get_response
(EngineCore pid=209)     raise RuntimeError(
(EngineCore pid=209) RuntimeError: Worker failed with error 'AssertionError: auto_functionalized was not removed', please check the stack trace above for the root cause
(APIServer pid=1) Traceback (most recent call last):
(APIServer pid=1)   File "/usr/local/bin/vllm", line 10, in <module>
(APIServer pid=1)     sys.exit(main())
(APIServer pid=1)              ^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 92, in main
(APIServer pid=1)     args.dispatch_function(args)
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 122, in cmd
(APIServer pid=1)     uvloop.run(run_server(args))
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 96, in run
(APIServer pid=1)     return __asyncio.run(
(APIServer pid=1)            ^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=1)     return runner.run(main)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=1)     return self._loop.run_until_complete(task)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=1)     return await main
(APIServer pid=1)            ^^^^^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 693, in run_server
(APIServer pid=1)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 707, in run_server_worker
(APIServer pid=1)     async with build_async_engine_client(
(APIServer pid=1)                ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=1)     return await anext(self.gen)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 100, in build_async_engine_client
(APIServer pid=1)     async with build_async_engine_client_from_engine_args(
(APIServer pid=1)                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=1)     return await anext(self.gen)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 136, in build_async_engine_client_from_engine_args
(APIServer pid=1)     async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=1)                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 217, in from_vllm_config
(APIServer pid=1)     return cls(
(APIServer pid=1)            ^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 146, in __init__
(APIServer pid=1)     self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=1)                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(APIServer pid=1)     return func(*args, **kwargs)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 130, in make_async_mp_client
(APIServer pid=1)     return AsyncMPClient(*client_args)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(APIServer pid=1)     return func(*args, **kwargs)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 900, in __init__
(APIServer pid=1)     super().__init__(
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 535, in __init__
(APIServer pid=1)     with launch_core_engines(
(APIServer pid=1)          ^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/lib/python3.12/contextlib.py", line 144, in __exit__
(APIServer pid=1)     next(self.gen)
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 1128, in launch_core_engines
(APIServer pid=1)     wait_for_engine_startup(
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 1187, in wait_for_engine_startup
(APIServer pid=1)     raise RuntimeError(
(APIServer pid=1) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
/usr/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 4 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
RAW_BUFFERClick to expand / collapse

Your current environment

root@node37:/disk1/DeepSeek-V4-Flash# nvidia-smi Mon May 18 16:25:07 2026
+-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA L20 Off | 00000000:02:00.0 Off | 0 | | N/A 29C P0 75W / 350W | 0MiB / 46068MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 1 NVIDIA L20 Off | 00000000:03:00.0 Off | 0 | | N/A 28C P0 77W / 350W | 0MiB / 46068MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 2 NVIDIA L20 Off | 00000000:82:00.0 Off | 0 | | N/A 27C P0 73W / 350W | 0MiB / 46068MiB | 2% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ | 3 NVIDIA L20 Off | 00000000:83:00.0 Off | 0 | | N/A 28C P0 74W / 350W | 0MiB / 46068MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+ root@node37:/disk1/DeepSeek-V4-Flash# cat docker-compose.yml services:

vllm

vllm-openai: image: vllm/vllm-openai:v0.21.0 container_name: DeepSeek-V4-Flash restart: always runtime: nvidia ports: - 8027:8000 volumes: - /disk1/:/models command: > --model /models/DeepSeek-V4-Flash --trust-remote-code --kv-cache-dtype fp8 --max-model-len 262144 --tensor-parallel-size 4 --gpu-memory-utilization 0.9 --block-size 256 --enable-expert-parallel --compilation-config '{"cudagraph_mode":"FULL_AND_PIECEWISE", "custom_ops":["all"]}' --tokenizer-mode deepseek_v4 --reasoning-parser deepseek_v4 --enable-auto-tool-choice --tool-call-parser deepseek_v4 --speculative_config '{"method":"mtp","num_speculative_tokens":1}' --default-chat-template-kwargs '{"enable_thinking":false}' deploy: resources: reservations: devices: - driver: nvidia capabilities: [gpu] device_ids: [ "0,1,2,3" ] ipc: host networks: vllm:

🐛 Describe the bug

(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5616, in _dummy_run
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     outputs = self.model(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]               ^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 254, in __call__
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return self.runnable(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return self._call_impl(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1790, in _call_impl
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return forward_call(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v4.py", line 1669, in forward
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     hidden_states = self.model(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                     ^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 663, in __call__
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     self.aot_compiled_fn = self.aot_compile(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 167, in aot_compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return self._compiled_callable.aot_compile((args, kwargs))
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 873, in aot_compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return aot_compile_fullgraph(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/aot_compile.py", line 368, in aot_compile_fullgraph
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     compiled_fn = backend(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                   ^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/__init__.py", line 2535, in __call__
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return self.compiler_fn(model_, inputs_, **self.kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/lib/python3.12/contextlib.py", line 81, in inner
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return func(*args, **kwds)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 1214, in __call__
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     PiecewiseCompileInterpreter(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 723, in run
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return super().run(*args)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/interpreter.py", line 200, in run
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     self.env[node] = self.run_node(node)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                      ^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/interpreter.py", line 297, in run_node
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return getattr(self, n.op)(n.target, args, kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 750, in call_module
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     piecewise_backend = PiecewiseBackend(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                         ^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 190, in __init__
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     self.compile_all_ranges()
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 266, in compile_all_ranges
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     range_entry.runnable = self.vllm_backend.compiler_manager.compile(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 353, in compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     compiled_graph, handle = self.compiler.compile(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                              ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/compiler_interface.py", line 376, in compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     compiled_graph = standalone_compile(graph, example_inputs, **compile_kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/__init__.py", line 444, in standalone_compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return standalone_compile(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/standalone_compile.py", line 444, in standalone_compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     compiled_fn = compile_fx(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                   ^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2527, in compile_fx
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return compile_fx(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2578, in compile_fx
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return _maybe_wrap_and_compile_fx_main(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2655, in _maybe_wrap_and_compile_fx_main
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return _compile_fx_main(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2864, in _compile_fx_main
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1053, in _compile_fx_inner
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     raise InductorError(e, currentframe()).with_traceback(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1037, in _compile_fx_inner
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     mb_compiled_graph = fx_codegen_and_compile(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                         ^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1798, in fx_codegen_and_compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1344, in codegen_and_compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     _recursive_post_grad_passes(gm, is_inference=is_inference)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 583, in _recursive_post_grad_passes
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     post_grad_passes(gm, is_inference)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/fx_passes/post_grad.py", line 358, in post_grad_passes
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     GraphTransformObserver(gm, "decompose_auto_functionalized").apply_graph_pass(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/passes/graph_transform_observer.py", line 103, in apply_graph_pass
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return pass_fn(self.gm.graph)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/fx_passes/post_grad.py", line 1392, in decompose_auto_functionalized
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     raise AssertionError("auto_functionalized was not removed")
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] torch._inductor.exc.InductorError: AssertionError: auto_functionalized was not removed
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] Traceback (most recent call last):
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 957, in worker_busy_loop
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     output = func(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 392, in determine_available_memory
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     self.model_runner.profile_run()
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5948, in profile_run
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     hidden_states, last_hidden_states = self._dummy_run(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                                         ^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5616, in _dummy_run
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     outputs = self.model(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]               ^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 254, in __call__
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return self.runnable(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return self._call_impl(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1790, in _call_impl
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return forward_call(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/deepseek_v4.py", line 1669, in forward
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     hidden_states = self.model(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                     ^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 663, in __call__
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     self.aot_compiled_fn = self.aot_compile(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 167, in aot_compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return self._compiled_callable.aot_compile((args, kwargs))
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 873, in aot_compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return aot_compile_fullgraph(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/aot_compile.py", line 368, in aot_compile_fullgraph
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     compiled_fn = backend(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                   ^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/__init__.py", line 2535, in __call__
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return self.compiler_fn(model_, inputs_, **self.kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/lib/python3.12/contextlib.py", line 81, in inner
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return func(*args, **kwds)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 1214, in __call__
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     PiecewiseCompileInterpreter(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 723, in run
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return super().run(*args)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/interpreter.py", line 200, in run
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     self.env[node] = self.run_node(node)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                      ^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/interpreter.py", line 297, in run_node
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return getattr(self, n.op)(n.target, args, kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 750, in call_module
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     piecewise_backend = PiecewiseBackend(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                         ^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 190, in __init__
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     self.compile_all_ranges()
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 266, in compile_all_ranges
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     range_entry.runnable = self.vllm_backend.compiler_manager.compile(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 353, in compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     compiled_graph, handle = self.compiler.compile(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                              ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/compiler_interface.py", line 376, in compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     compiled_graph = standalone_compile(graph, example_inputs, **compile_kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/__init__.py", line 444, in standalone_compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return standalone_compile(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/standalone_compile.py", line 444, in standalone_compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     compiled_fn = compile_fx(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                   ^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2527, in compile_fx
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return compile_fx(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2578, in compile_fx
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return _maybe_wrap_and_compile_fx_main(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2655, in _maybe_wrap_and_compile_fx_main
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return _compile_fx_main(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2864, in _compile_fx_main
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1053, in _compile_fx_inner
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     raise InductorError(e, currentframe()).with_traceback(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1037, in _compile_fx_inner
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     mb_compiled_graph = fx_codegen_and_compile(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]                         ^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1798, in fx_codegen_and_compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1344, in codegen_and_compile
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     _recursive_post_grad_passes(gm, is_inference=is_inference)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 583, in _recursive_post_grad_passes
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     post_grad_passes(gm, is_inference)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/fx_passes/post_grad.py", line 358, in post_grad_passes
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     GraphTransformObserver(gm, "decompose_auto_functionalized").apply_graph_pass(
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/fx/passes/graph_transform_observer.py", line 103, in apply_graph_pass
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     return pass_fn(self.gm.graph)
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]   File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/fx_passes/post_grad.py", line 1392, in decompose_auto_functionalized
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962]     raise AssertionError("auto_functionalized was not removed")
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] torch._inductor.exc.InductorError: AssertionError: auto_functionalized was not removed
(Worker_TP0_EP0 pid=279) ERROR 05-18 08:21:14 [multiproc_executor.py:962] 
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] EngineCore failed to start.
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] Traceback (most recent call last):
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1114, in run_engine_core
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     return func(*args, **kwargs)
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 880, in __init__
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     super().__init__(
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 128, in __init__
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     kv_cache_config = self._initialize_kv_caches(vllm_config)
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     return func(*args, **kwargs)
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 250, in _initialize_kv_caches
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     available_gpu_memory = self.model_executor.determine_available_memory()
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 147, in determine_available_memory
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     return self.collective_rpc("determine_available_memory")
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 403, in collective_rpc
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     return future if non_block else future.result()
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]                                     ^^^^^^^^^^^^^^^
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 90, in result
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     return super().result()
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]            ^^^^^^^^^^^^^^^^
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     return self.__get_result()
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]            ^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     raise self._exception
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 94, in _wait_for_response
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     response = self.aggregate(self.get_response())
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]                               ^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 390, in get_response
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140]     raise RuntimeError(
(EngineCore pid=209) ERROR 05-18 08:21:14 [core.py:1140] RuntimeError: Worker failed with error 'AssertionError: auto_functionalized was not removed', please check the stack trace above for the root cause
(Worker_TP0_EP0 pid=279) WARNING 05-18 08:21:14 [multiproc_executor.py:884] WorkerProc was terminated
(Worker_TP3_EP3 pid=282) WARNING 05-18 08:21:14 [multiproc_executor.py:884] WorkerProc was terminated
(Worker_TP2_EP2 pid=281) WARNING 05-18 08:21:14 [multiproc_executor.py:884] WorkerProc was terminated
(Worker_TP1_EP1 pid=280) WARNING 05-18 08:21:14 [multiproc_executor.py:884] WorkerProc was terminated
(EngineCore pid=209) ERROR 05-18 08:21:17 [multiproc_executor.py:283] Worker proc VllmWorker-2 died unexpectedly, shutting down executor.
(EngineCore pid=209) Process EngineCore:
(EngineCore pid=209) Traceback (most recent call last):
(EngineCore pid=209)   File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore pid=209)     self.run()
(EngineCore pid=209)   File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore pid=209)     self._target(*self._args, **self._kwargs)
(EngineCore pid=209)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1144, in run_engine_core
(EngineCore pid=209)     raise e
(EngineCore pid=209)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1114, in run_engine_core
(EngineCore pid=209)     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore pid=209)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209)   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=209)     return func(*args, **kwargs)
(EngineCore pid=209)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 880, in __init__
(EngineCore pid=209)     super().__init__(
(EngineCore pid=209)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 128, in __init__
(EngineCore pid=209)     kv_cache_config = self._initialize_kv_caches(vllm_config)
(EngineCore pid=209)                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209)   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=209)     return func(*args, **kwargs)
(EngineCore pid=209)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 250, in _initialize_kv_caches
(EngineCore pid=209)     available_gpu_memory = self.model_executor.determine_available_memory()
(EngineCore pid=209)                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 147, in determine_available_memory
(EngineCore pid=209)     return self.collective_rpc("determine_available_memory")
(EngineCore pid=209)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 403, in collective_rpc
(EngineCore pid=209)     return future if non_block else future.result()
(EngineCore pid=209)                                     ^^^^^^^^^^^^^^^
(EngineCore pid=209)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 90, in result
(EngineCore pid=209)     return super().result()
(EngineCore pid=209)            ^^^^^^^^^^^^^^^^
(EngineCore pid=209)   File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
(EngineCore pid=209)     return self.__get_result()
(EngineCore pid=209)            ^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209)   File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
(EngineCore pid=209)     raise self._exception
(EngineCore pid=209)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 94, in _wait_for_response
(EngineCore pid=209)     response = self.aggregate(self.get_response())
(EngineCore pid=209)                               ^^^^^^^^^^^^^^^^^^^
(EngineCore pid=209)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 390, in get_response
(EngineCore pid=209)     raise RuntimeError(
(EngineCore pid=209) RuntimeError: Worker failed with error 'AssertionError: auto_functionalized was not removed', please check the stack trace above for the root cause
(APIServer pid=1) Traceback (most recent call last):
(APIServer pid=1)   File "/usr/local/bin/vllm", line 10, in <module>
(APIServer pid=1)     sys.exit(main())
(APIServer pid=1)              ^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 92, in main
(APIServer pid=1)     args.dispatch_function(args)
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 122, in cmd
(APIServer pid=1)     uvloop.run(run_server(args))
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 96, in run
(APIServer pid=1)     return __asyncio.run(
(APIServer pid=1)            ^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=1)     return runner.run(main)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=1)     return self._loop.run_until_complete(task)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=1)     return await main
(APIServer pid=1)            ^^^^^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 693, in run_server
(APIServer pid=1)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 707, in run_server_worker
(APIServer pid=1)     async with build_async_engine_client(
(APIServer pid=1)                ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=1)     return await anext(self.gen)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 100, in build_async_engine_client
(APIServer pid=1)     async with build_async_engine_client_from_engine_args(
(APIServer pid=1)                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=1)     return await anext(self.gen)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 136, in build_async_engine_client_from_engine_args
(APIServer pid=1)     async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=1)                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 217, in from_vllm_config
(APIServer pid=1)     return cls(
(APIServer pid=1)            ^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 146, in __init__
(APIServer pid=1)     self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=1)                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(APIServer pid=1)     return func(*args, **kwargs)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 130, in make_async_mp_client
(APIServer pid=1)     return AsyncMPClient(*client_args)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(APIServer pid=1)     return func(*args, **kwargs)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 900, in __init__
(APIServer pid=1)     super().__init__(
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 535, in __init__
(APIServer pid=1)     with launch_core_engines(
(APIServer pid=1)          ^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/lib/python3.12/contextlib.py", line 144, in __exit__
(APIServer pid=1)     next(self.gen)
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 1128, in launch_core_engines
(APIServer pid=1)     wait_for_engine_startup(
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 1187, in wait_for_engine_startup
(APIServer pid=1)     raise RuntimeError(
(APIServer pid=1) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
/usr/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 4 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - 💡(How to fix) Fix [Bug]: DeepSeek-V4-Flash for L20,RuntimeError: Worker failed with error 'AssertionError: auto_functionalized was not removed', please check the stack trace above for the root cause