vllm - ✅(Solved) Fix [Bug]:Qwen3.5-35B-A3B vllm v0.17.0 ERROR 03-10 00:52:24 [multiproc_executor.py:261] Worker proc VllmWorker-0 died unexpectedly, shutting down executor. [1 pull requests, 5 comments, 3 participants]

vllm2026-03-10 01:05:06

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#36566•Fetched 2026-04-08 00:36:15

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

commented ×5subscribed ×2cross-referenced ×1labeled ×1

Error Message

(Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] WorkerProc hit an exception. (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] Traceback (most recent call last): (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 875, in worker_busy_loop (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] output = func(*args, **kwargs) (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] return func(*args, **kwargs) (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 522, in compile_or_warm_up_model (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] cuda_graph_memory_bytes = self.model_runner.capture_model() (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] return func(*args, **kwargs) (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5337, in capture_model (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] self._capture_cudagraphs( (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5430, in _capture_cudagraphs (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] dummy_run( (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] return func(*args, **kwargs) (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4976, in _dummy_run (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] outputs = self.model( (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 223, in call (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] return self.runnable(*args, **kwargs) (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] return self._call_impl(*args, **kwargs) (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Root Cause

root@node37:/disk1/Qwen3.5-35B-A3B# more docker-compose.yml version: '3.3' services:

vllm

vllm-openai: image: vllm/vllm-openai:v0.17.0 container_name: Qwen3.5-35B-A3B restart: always runtime: nvidia ports: - 8042:8000 volumes: - /disk1/:/models command: > --model /models/Qwen3.5-35B-A3B --enable-auto-tool-choice --tool-call-parser hermes
--tokenizer_mode="auto" --dtype=bfloat16 --tensor_parallel_size=2 --gpu-memory-utilization=0.8 --max-model-len=32768 --served-model-name=Qwen3.5-35B-A3B deploy: resources: reservations: devices: - driver: nvidia capabilities: [gpu] device_ids: [ "1,2" ] ipc: host networks: vllm: root@node37:/disk1/Qwen3.5-35B-A3B# docker logs -f Qwen3.5-35B-A3B /usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/chat_completion/protocol.py:346: SyntaxWarning: invalid escape sequence '\e' "(e.g. 'abcdabcdabcd...' or '\emoji \emoji \emoji ...'). This feature " /usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/completion/protocol.py:176: SyntaxWarning: invalid escape sequence '\e' "(e.g. 'abcdabcdabcd...' or '\emoji \emoji \emoji ...'). This feature " WARNING 03-10 00:49:33 [argparse_utils.py:193] With vllm serve, you should provide the model as a positional argument or in a config file instead of via the --model option. The --model option will be removed in v0.13. (APIServer pid=1) INFO 03-10 00:49:33 [utils.py:302] (APIServer pid=1) INFO 03-10 00:49:33 [utils.py:302] █ █ █▄ ▄█ (APIServer pid=1) INFO 03-10 00:49:33 [utils.py:302] ▄▄ ▄█ █ █ █ ▀▄▀ █ version 0.17.0 (APIServer pid=1) INFO 03-10 00:49:33 [utils.py:302] █▄█▀ █ █ █ █ model /models/Qwen3.5-35B-A3B (APIServer pid=1) INFO 03-10 00:49:33 [utils.py:302] ▀▀ ▀▀▀▀▀ ▀▀▀▀▀ ▀ ▀ (APIServer pid=1) INFO 03-10 00:49:33 [utils.py:302] (APIServer pid=1) INFO 03-10 00:49:33 [utils.py:238] non-default args: {'model_tag': '/models/Qwen3.5-35B-A3B', 'enable_auto_tool_choice': True, 'tool_call_parser': 'hermes', 'model': '/models/Qwen3.5-35B-A3B', 'dtype': 'bfloat16', 'max_model_len': 32768, 'served_model_name': ['Qwen3.5-35B-A3B'], 'tensor_parallel_size': 2, 'gpu_memory_utilization': 0.8} (APIServer pid=1) INFO 03-10 00:49:43 [model.py:531] Resolved architecture: Qwen3_5MoeForConditionalGeneration (APIServer pid=1) INFO 03-10 00:49:43 [model.py:1554] Using max model len 32768 (APIServer pid=1) INFO 03-10 00:49:43 [scheduler.py:231] Chunked prefill is enabled with max_num_batched_tokens=2048. (APIServer pid=1) INFO 03-10 00:49:43 [config.py:544] Setting attention block size to 1056 tokens to ensure that attention page size is >= mamba page size. (APIServer pid=1) INFO 03-10 00:49:43 [config.py:575] Padding mamba page size by 0.76% to ensure that mamba page size and attention page size are exactly equal. (APIServer pid=1) INFO 03-10 00:49:43 [vllm.py:747] Asynchronous scheduling is enabled. (EngineCore_DP0 pid=157) INFO 03-10 00:50:02 [core.py:101] Initializing a V1 LLM engine (v0.17.0) with config: model='/models/Qwen3.5-35B-A3B', speculative_config=None, tokenizer='/models/Qwen3.5-35B-A3B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=2, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, enable_return_routed_experts=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_metrics=False, kv_cache_metrics_sample=0.01, cudagraph_metrics=False, enable_layerwise_nvtx_tracing=False, enable_mfu_metrics=False, enable_mm_processor_stats=False, enable_logging_iteration_details=False), seed=0, served_model_name=Qwen3.5-35B-A3B, enable_prefix_caching=False, enable_chunked_prefill=True, pooler_config=None, compilation_config={'level': None, 'mode': <CompilationMode.VLLM_COMPILE: 3>, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['none'], 'splitting_ops': ['vllm::unified_attention', 'vllm::unified_attention_with_output', 'vllm::unified_mla_attention', 'vllm::unified_mla_attention_with_output', 'vllm::mamba_mixer2', 'vllm::mamba_mixer', 'vllm::short_conv', 'vllm::linear_attention', 'vllm::plamo2_mamba_mixer', 'vllm::gdn_attention_core', 'vllm::kda_attention', 'vllm::sparse_attn_indexer', 'vllm::rocm_aiter_sparse_attn_indexer', 'vllm::unified_kv_cache_update', 'vllm::unified_mla_kv_cache_update'], 'compile_mm_encoder': False, 'compile_sizes': [], 'compile_ranges_split_points': [2048], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.FULL_AND_PIECEWISE: (2, 1)>, 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256, 272, 288, 304, 320, 336, 352, 368, 384, 400, 416, 432, 448, 464, 480, 496, 512], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False}, 'max_cudagraph_capture_size': 512, 'dynamic_shapes_config': {'type': <DynamicShapesType.BACKED: 'backed'>, 'evaluate_guards': False, 'assume_32_bit_indexing': False}, 'local_cache_dir': None, 'fast_moe_cold_start': True, 'static_all_moe_layers': []} (EngineCore_DP0 pid=157) WARNING 03-10 00:50:02 [multiproc_executor.py:945] Reducing Torch parallelism from 16 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed. (EngineCore_DP0 pid=157) INFO 03-10 00:50:02 [multiproc_executor.py:134] DP group leader: node_rank=0, node_rank_within_dp=0, master_addr=127.0.0.1, mq_connect_ip=172.20.0.2 (local), world_size=2, local_world_size=2 (Worker pid=228) INFO 03-10 00:50:12 [parallel_state.py:1393] world_size=2 rank=1 local_rank=1 distributed_init_method=tcp://127.0.0.1:41903 backend=nccl (Worker pid=227) INFO 03-10 00:50:12 [parallel_state.py:1393] world_size=2 rank=0 local_rank=0 distributed_init_method=tcp://127.0.0.1:41903 backend=nccl (Worker pid=227) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. (Worker pid=227) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. (Worker pid=227) INFO 03-10 00:50:13 [pynccl.py:111] vLLM is using nccl==2.27.5 (Worker pid=228) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. (Worker pid=228) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. (Worker pid=227) WARNING 03-10 00:50:14 [symm_mem.py:67] SymmMemCommunicator: Device capability 8.9 not supported, communicator is not available. (Worker pid=228) WARNING 03-10 00:50:14 [symm_mem.py:67] SymmMemCommunicator: Device capability 8.9 not supported, communicator is not available. (Worker pid=228) WARNING 03-10 00:50:14 [custom_all_reduce.py:165] Custom allreduce is disabled because your platform lacks GPU P2P capability or P2P test failed. To silence this warning, specify disable_custom_all_reduce=True explicitly. (Worker pid=227) WARNING 03-10 00:50:14 [custom_all_reduce.py:165] Custom allreduce is disabled because your platform lacks GPU P2P capability or P2P test failed. To silence this warning, specify disable_custom_all_reduce=True explicitly. (Worker pid=228) INFO 03-10 00:50:14 [parallel_state.py:1715] rank 1 in world size 2 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 1, EP rank 1, EPLB rank N/A (Worker pid=227) INFO 03-10 00:50:14 [parallel_state.py:1715] rank 0 in world size 2 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank 0, EPLB rank N/A (Worker pid=227) INFO 03-10 00:50:23 [base.py:106] Offloader set to NoopOffloader (Worker pid=227) (Worker_TP0 pid=227) INFO 03-10 00:50:23 [gpu_model_runner.py:4255] Starting to load model /models/Qwen3.5-35B-A3B... (Worker pid=228) INFO 03-10 00:50:23 [base.py:106] Offloader set to NoopOffloader (Worker pid=228) (Worker_TP1 pid=228) INFO 03-10 00:50:23 [cuda.py:453] Using backend AttentionBackendEnum.FLASH_ATTN for vit attention (Worker pid=228) (Worker_TP1 pid=228) INFO 03-10 00:50:23 [mm_encoder_attention.py:215] Using AttentionBackendEnum.FLASH_ATTN for MMEncoderAttention. (Worker pid=227) (Worker_TP0 pid=227) INFO 03-10 00:50:23 [cuda.py:453] Using backend AttentionBackendEnum.FLASH_ATTN for vit attention (Worker pid=227) (Worker_TP0 pid=227) INFO 03-10 00:50:23 [mm_encoder_attention.py:215] Using AttentionBackendEnum.FLASH_ATTN for MMEncoderAttention. (Worker pid=227) (Worker_TP0 pid=227) INFO 03-10 00:50:23 [unquantized.py:186] Using TRITON backend for Unquantized MoE (Worker pid=227) (Worker_TP0 pid=227) INFO 03-10 00:50:23 [cuda.py:405] Using FLASH_ATTN attention backend out of potential backends: ['FLASH_ATTN', 'FLASHINFER', 'TRITON_ATTN', 'FLEX_ATTENTION']. (Worker pid=227) (Worker_TP0 pid=227) INFO 03-10 00:50:23 [flash_attn.py:587] Using FlashAttention version 2 Loading safetensors checkpoint shards: 0% Completed | 0/14 [00:00<?, ?it/s] Loading safetensors checkpoint shards: 7% Completed | 1/14 [00:00<00:12, 1.00it/s] Loading safetensors checkpoint shards: 14% Completed | 2/14 [00:02<00:12, 1.06s/it] Loading safetensors checkpoint shards: 21% Completed | 3/14 [00:03<00:11, 1.08s/it] Loading safetensors checkpoint shards: 29% Completed | 4/14 [00:04<00:10, 1.08s/it] Loading safetensors checkpoint shards: 36% Completed | 5/14 [00:05<00:09, 1.08s/it] Loading safetensors checkpoint shards: 43% Completed | 6/14 [00:06<00:08, 1.09s/it] Loading safetensors checkpoint shards: 50% Completed | 7/14 [00:07<00:07, 1.11s/it] Loading safetensors checkpoint shards: 57% Completed | 8/14 [00:08<00:06, 1.11s/it] Loading safetensors checkpoint shards: 64% Completed | 9/14 [00:10<00:05, 1.16s/it] Loading safetensors checkpoint shards: 71% Completed | 10/14 [00:11<00:05, 1.38s/it] Loading safetensors checkpoint shards: 79% Completed | 11/14 [00:13<00:04, 1.53s/it] Loading safetensors checkpoint shards: 86% Completed | 12/14 [00:15<00:03, 1.65s/it] Loading safetensors checkpoint shards: 93% Completed | 13/14 [00:17<00:01, 1.58s/it] Loading safetensors checkpoint shards: 100% Completed | 14/14 [00:17<00:00, 1.25s/it] Loading safetensors checkpoint shards: 100% Completed | 14/14 [00:17<00:00, 1.25s/it] (Worker pid=227) (Worker_TP0 pid=227) (Worker pid=227) (Worker_TP0 pid=227) INFO 03-10 00:50:41 [default_loader.py:293] Loading weights took 17.58 seconds (Worker pid=227) (Worker_TP0 pid=227) INFO 03-10 00:50:42 [gpu_model_runner.py:4338] Model loading took 32.86 GiB memory and 18.260981 seconds (Worker pid=227) (Worker_TP0 pid=227) INFO 03-10 00:50:42 [gpu_model_runner.py:5254] Encoder cache will be initialized with a budget of 16384 tokens, and profiled with 1 image items of the maximum feature size. (Worker pid=228) (Worker_TP1 pid=228) INFO 03-10 00:50:42 [gpu_model_runner.py:5254] Encoder cache will be initialized with a budget of 16384 tokens, and profiled with 1 image items of the maximum feature size. (Worker pid=227) (Worker_TP0 pid=227) INFO 03-10 00:51:03 [backends.py:916] Using cache directory: /root/.cache/vllm/torch_compile_cache/6a6fa79304/rank_0_0/backbone for vLLM's torch.compile (Worker pid=227) (Worker_TP0 pid=227) INFO 03-10 00:51:03 [backends.py:976] Dynamo bytecode transform time: 11.96 s (Worker pid=227) (Worker_TP0 pid=227) INFO 03-10 00:51:10 [backends.py:350] Cache the graph of compile range (1, 2048) for later use (Worker pid=228) (Worker_TP1 pid=228) INFO 03-10 00:51:10 [backends.py:350] Cache the graph of compile range (1, 2048) for later use (Worker pid=227) (Worker_TP0 pid=227) WARNING 03-10 00:51:18 [fused_moe.py:1093] Using default MoE config. Performance might be sub-optimal! Config file not found at /usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=NVIDIA_L20.json (EngineCore_DP0 pid=157) INFO 03-10 00:51:42 [shm_broadcast.py:548] No available shared memory broadcast block found in 60 seconds. This typically happens when some processes are hanging or doing some time-consuming work (e.g. compilation, weight/kv cache quantization). (Worker pid=228) (Worker_TP1 pid=228) INFO 03-10 00:51:45 [decorators.py:580] saving AOT compiled function to /root/.cache/vllm/torch_compile_cache/torch_aot_compile/f9e66cd2e010a63206291a2c03b5f811e754bd48f2b6c8e748333c65ef6fd282/rank_1_0/model (Worker pid=227) (Worker_TP0 pid=227) INFO 03-10 00:51:45 [backends.py:366] Compiling a graph for compile range (1, 2048) takes 38.64 s (Worker pid=227) (Worker_TP0 pid=227) INFO 03-10 00:51:45 [monitor.py:35] torch.compile takes 53.57 s in total (Worker pid=227) (Worker_TP0 pid=227) INFO 03-10 00:51:45 [decorators.py:580] saving AOT compiled function to /root/.cache/vllm/torch_compile_cache/torch_aot_compile/f9e66cd2e010a63206291a2c03b5f811e754bd48f2b6c8e748333c65ef6fd282/rank_0_0/model (Worker pid=228) (Worker_TP1 pid=228) INFO 03-10 00:51:47 [decorators.py:588] saved AOT compiled function to /root/.cache/vllm/torch_compile_cache/torch_aot_compile/f9e66cd2e010a63206291a2c03b5f811e754bd48f2b6c8e748333c65ef6fd282/rank_1_0/model (Worker pid=227) (Worker_TP0 pid=227) INFO 03-10 00:51:47 [decorators.py:588] saved AOT compiled function to /root/.cache/vllm/torch_compile_cache/torch_aot_compile/f9e66cd2e010a63206291a2c03b5f811e754bd48f2b6c8e748333c65ef6fd282/rank_0_0/model (Worker pid=227) (Worker_TP0 pid=227) INFO 03-10 00:52:00 [gpu_worker.py:424] Available KV cache memory: 1.25 GiB (EngineCore_DP0 pid=157) INFO 03-10 00:52:00 [kv_cache_utils.py:1314] GPU KV cache size: 32,736 tokens (EngineCore_DP0 pid=157) INFO 03-10 00:52:00 [kv_cache_utils.py:1319] Maximum concurrency for 32,768 tokens per request: 3.54x Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 100%|██████████| 51/51 [00:19<00:00, 2.56it/s] Capturing CUDA graphs (decode, FULL): 0%| | 0/35 [00:00<?, ?it/s] (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] WorkerProc hit an exception. (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] Traceback (most recent call last): (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 875, in worker_busy_loop (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] output = func(*args, **kwargs) (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] return func(*args, **kwargs) (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 522, in compile_or_warm_up_model (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] cuda_graph_memory_bytes = self.model_runner.capture_model() (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] return func(*args, **kwargs) (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5337, in capture_model (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] self._capture_cudagraphs( (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5430, in _capture_cudagraphs (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] dummy_run( (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] return func(*args, **kwargs) (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4976, in _dummy_run (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] outputs = self.model( (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 223, in call (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] return self.runnable(*args, **kwargs) (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] return self._call_impl(*args, **kwargs) (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] return forward_call(*args, **kwargs) (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_5.py", line 738, in forward (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] hidden_states = self.language_model.model( (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 402, in call (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] return self.aot_compiled_fn(self, *args, **kwargs) (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/aot_compile.py", line 124, in call (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] return self.fn(*args, **kwargs) (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 1132, in forward (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] def forward( (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/caching.py", line 198, in call (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] return self.optimized_call(*args, **kwargs) (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 936, in call_wrapped (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] return self._wrapped_call(self, *args, **kwargs) (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 455, in call (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] raise e (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 442, in call (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] return super(self.cls, obj).call(*args, **kwargs) # type: ignore[misc] (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] return self._call_impl(*args, **kwargs) (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] return forward_call(*args, **kwargs) (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "<eval_with_key>.83", line 256, in forward (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] submod_1 = self.submod_1(getitem, s59, getitem_1, getitem_2, getitem_3); getitem = getitem_1 = getitem_2 = submod_1 = None (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 936, in call_wrapped (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] return self._wrapped_call(self, *args, **kwargs) (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 455, in call (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] raise e (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 442, in call (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] return super(self.cls, obj).call(*args, **kwargs) # type: ignore[misc] (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] return self._call_impl(*args, **kwargs) (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] return forward_call(*args, **kwargs) (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "<eval_with_key>.85", line 5, in forward (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] gdn_attention_core = torch.ops.vllm.gdn_attention_core(mixed_qkv, b_1, a_1, core_attn_out, 'language_model.model.layers.0.linear_attn'); mixed_qkv = b_1 = a_1 = core_attn_out = gdn_attention_core = None (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1209, in call (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] return self._op(*args, **kwargs) (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 1451, in gdn_attention_core (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] self._forward_core( (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 715, in _forward_core (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] mixed_qkv_non_spec = causal_conv1d_update( (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/mamba/ops/causal_conv1d.py", line 1162, in causal_conv1d_update (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] assert num_cache_lines >= batch (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] AssertionError (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] Traceback (most recent call last): (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 875, in worker_busy_loop (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] output = func(*args, **kwargs) (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] return func(*args, **kwargs) (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 522, in compile_or_warm_up_model (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] cuda_graph_memory_bytes = self.model_runner.capture_model() (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] return func(*args, **kwargs) (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5337, in capture_model (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] self._capture_cudagraphs( (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5430, in _capture_cudagraphs (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] dummy_run( (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 124, in decorate_context (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] return func(*args, **kwargs) (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4976, in _dummy_run (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] outputs = self.model( (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 223, in call (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] return self.runnable(*args, **kwargs) (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] return self._call_impl(*args, **kwargs) (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] return forward_call(*args, **kwargs) (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_5.py", line 738, in forward (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] hidden_states = self.language_model.model( (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 402, in call (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] return self.aot_compiled_fn(self, *args, **kwargs) (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/aot_compile.py", line 124, in call (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] return self.fn(*args, **kwargs) (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 1132, in forward (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] def forward( (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/caching.py", line 198, in call (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] return self.optimized_call(*args, **kwargs) (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 936, in call_wrapped (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] return self._wrapped_call(self, *args, **kwargs) (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 455, in call (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] raise e (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 442, in call (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] return super(self.cls, obj).call(*args, **kwargs) # type: ignore[misc] (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] return self._call_impl(*args, **kwargs) (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] return forward_call(*args, **kwargs) (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "<eval_with_key>.83", line 256, in forward (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] submod_1 = self.submod_1(getitem, s59, getitem_1, getitem_2, getitem_3); getitem = getitem_1 = getitem_2 = submod_1 = None (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 936, in call_wrapped (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] return self._wrapped_call(self, *args, **kwargs) (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 455, in call (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] raise e (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 442, in call (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] return super(self.cls, obj).call(*args, **kwargs) # type: ignore[misc] (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] return self._call_impl(*args, **kwargs) (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] return forward_call(*args, **kwargs) (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "<eval_with_key>.85", line 5, in forward (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] gdn_attention_core = torch.ops.vllm.gdn_attention_core(mixed_qkv, b_1, a_1, core_attn_out, 'language_model.model.layers.0.linear_attn'); mixed_qkv = b_1 = a_1 = core_attn_out = gdn_attention_core = None (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1209, in call (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] return self._op(*args, **kwargs) (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 1451, in gdn_attention_core (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] self._forward_core( (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_next.py", line 715, in _forward_core (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] mixed_qkv_non_spec = causal_conv1d_update( (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/mamba/ops/causal_conv1d.py", line 1162, in causal_conv1d_update (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] assert num_cache_lines >= batch (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] ^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] AssertionError (Worker pid=227) (Worker_TP0 pid=227) ERROR 03-10 00:52:21 [multiproc_executor.py:880] (EngineCore_DP0 pid=157) ERROR 03-10 00:52:21 [core.py:1100] EngineCore failed to start. (EngineCore_DP0 pid=157) ERROR 03-10 00:52:21 [core.py:1100] Traceback (most recent call last): (EngineCore_DP0 pid=157) ERROR 03-10 00:52:21 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1090, in run_engine_core (EngineCore_DP0 pid=157) ERROR 03-10 00:52:21 [core.py:1100] engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs) (EngineCore_DP0 pid=157) ERROR 03-10 00:52:21 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=157) ERROR 03-10 00:52:21 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore_DP0 pid=157) ERROR 03-10 00:52:21 [core.py:1100] return func(*args, **kwargs) (EngineCore_DP0 pid=157) ERROR 03-10 00:52:21 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=157) ERROR 03-10 00:52:21 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 834, in init (EngineCore_DP0 pid=157) ERROR 03-10 00:52:21 [core.py:1100] super().init( (EngineCore_DP0 pid=157) ERROR 03-10 00:52:21 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 120, in init (EngineCore_DP0 pid=157) ERROR 03-10 00:52:21 [core.py:1100] num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches( (EngineCore_DP0 pid=157) ERROR 03-10 00:52:21 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=157) ERROR 03-10 00:52:21 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore_DP0 pid=157) ERROR 03-10 00:52:21 [core.py:1100] return func(*args, **kwargs) (EngineCore_DP0 pid=157) ERROR 03-10 00:52:21 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=157) ERROR 03-10 00:52:21 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 279, in _initialize_kv_caches (EngineCore_DP0 pid=157) ERROR 03-10 00:52:21 [core.py:1100] self.model_executor.initialize_from_config(kv_cache_configs) (EngineCore_DP0 pid=157) ERROR 03-10 00:52:21 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 118, in initialize_from_config (EngineCore_DP0 pid=157) ERROR 03-10 00:52:21 [core.py:1100] compilation_times: list[float] = self.collective_rpc("compile_or_warm_up_model") (EngineCore_DP0 pid=157) ERROR 03-10 00:52:21 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=157) ERROR 03-10 00:52:21 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 389, in collective_rpc (EngineCore_DP0 pid=157) ERROR 03-10 00:52:21 [core.py:1100] return aggregate(get_response()) (EngineCore_DP0 pid=157) ERROR 03-10 00:52:21 [core.py:1100] ^^^^^^^^^^^^^^ (EngineCore_DP0 pid=157) ERROR 03-10 00:52:21 [core.py:1100] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 372, in get_response (EngineCore_DP0 pid=157) ERROR 03-10 00:52:21 [core.py:1100] raise RuntimeError( (EngineCore_DP0 pid=157) ERROR 03-10 00:52:21 [core.py:1100] RuntimeError: Worker failed with error '', please check the stack trace above for the root cause (Worker pid=227) (Worker_TP0 pid=227) WARNING 03-10 00:52:21 [multiproc_executor.py:814] WorkerProc was terminated (Worker pid=228) (Worker_TP1 pid=228) WARNING 03-10 00:52:21 [multiproc_executor.py:814] WorkerProc was terminated (EngineCore_DP0 pid=157) ERROR 03-10 00:52:24 [multiproc_executor.py:261] Worker proc VllmWorker-0 died unexpectedly, shutting down executor. (EngineCore_DP0 pid=157) Process EngineCore_DP0: (EngineCore_DP0 pid=157) Traceback (most recent call last): (EngineCore_DP0 pid=157) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap (EngineCore_DP0 pid=157) self.run() (EngineCore_DP0 pid=157) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run (EngineCore_DP0 pid=157) self._target(*self._args, **self._kwargs) (EngineCore_DP0 pid=157) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1104, in run_engine_core (EngineCore_DP0 pid=157) raise e (EngineCore_DP0 pid=157) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1090, in run_engine_core (EngineCore_DP0 pid=157) engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs) (EngineCore_DP0 pid=157) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=157) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore_DP0 pid=157) return func(*args, **kwargs) (EngineCore_DP0 pid=157) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=157) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 834, in init (EngineCore_DP0 pid=157) super().init( (EngineCore_DP0 pid=157) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 120, in init (EngineCore_DP0 pid=157) num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches( (EngineCore_DP0 pid=157) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=157) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore_DP0 pid=157) return func(*args, **kwargs) (EngineCore_DP0 pid=157) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=157) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 279, in _initialize_kv_caches (EngineCore_DP0 pid=157) self.model_executor.initialize_from_config(kv_cache_configs) (EngineCore_DP0 pid=157) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 118, in initialize_from_config (EngineCore_DP0 pid=157) compilation_times: list[float] = self.collective_rpc("compile_or_warm_up_model") (EngineCore_DP0 pid=157) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=157) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 389, in collective_rpc (EngineCore_DP0 pid=157) return aggregate(get_response()) (EngineCore_DP0 pid=157) ^^^^^^^^^^^^^^ (EngineCore_DP0 pid=157) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 372, in get_response (EngineCore_DP0 pid=157) raise RuntimeError( (EngineCore_DP0 pid=157) RuntimeError: Worker failed with error '', please check the stack trace above for the root cause (APIServer pid=1) Traceback (most recent call last): (APIServer pid=1) File "/usr/local/bin/vllm", line 10, in <module> (APIServer pid=1) sys.exit(main()) (APIServer pid=1) ^^^^^^ (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 73, in main (APIServer pid=1) args.dispatch_function(args) (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 112, in cmd (APIServer pid=1) uvloop.run(run_server(args)) (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 96, in run (APIServer pid=1) return __asyncio.run( (APIServer pid=1) ^^^^^^^^^^^^^^ (APIServer pid=1) File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run (APIServer pid=1) return runner.run(main) (APIServer pid=1) ^^^^^^^^^^^^^^^^ (APIServer pid=1) File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run (APIServer pid=1) return self._loop.run_until_complete(task) (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 48, in wrapper (APIServer pid=1) return await main (APIServer pid=1) ^^^^^^^^^^ (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 471, in run_server (APIServer pid=1) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs) (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 490, in run_server_worker (APIServer pid=1) async with build_async_engine_client( (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1) File "/usr/lib/python3.12/contextlib.py", line 210, in aenter (APIServer pid=1) return await anext(self.gen) (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 96, in build_async_engine_client (APIServer pid=1) async with build_async_engine_client_from_engine_args( (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1) File "/usr/lib/python3.12/contextlib.py", line 210, in aenter (APIServer pid=1) return await anext(self.gen) (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 137, in build_async_engine_client_from_engine_args (APIServer pid=1) async_llm = AsyncLLM.from_vllm_config( (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 225, in from_vllm_config (APIServer pid=1) return cls( (APIServer pid=1) ^^^^ (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 154, in init (APIServer pid=1) self.engine_core = EngineCoreClient.make_async_mp_client( (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (APIServer pid=1) return func(*args, **kwargs) (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 127, in make_async_mp_client (APIServer pid=1) return AsyncMPClient(*client_args) (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (APIServer pid=1) return func(*args, **kwargs) (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 911, in init (APIServer pid=1) super().init( (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 569, in init (APIServer pid=1) with launch_core_engines( (APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1) File "/usr/lib/python3.12/contextlib.py", line 144, in exit (APIServer pid=1) next(self.gen) (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 951, in launch_core_engines (APIServer pid=1) wait_for_engine_startup( (APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 1010, in wait_for_engine_startup (APIServer pid=1) raise RuntimeError( (APIServer pid=1) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

Fix Action

Fix / Workaround

root@node37:/disk1/Qwen3.5-35B-A3B# more docker-compose.yml version: '3.3' services:

vllm

PR fix notes

PR #36639: [Bugfix] Fix causal_conv1d assertion crash during CUDA graph capture (#36566)

Repository: vllm-project/vllm
Author: haosdent
State: open | merged: False
Link: https://github.com/vllm-project/vllm/pull/36639

Description (problem / solution / changelog)

Purpose

Fix #36566. During CUDA graph capture (decode, FULL mode) for hybrid models like Qwen3.5-35B-A3B, the worker crashes with AssertionError at causal_conv1d.py due to the assertion assert num_cache_lines >= batch.

This assertion is incorrect when conv_state_indices is provided (indirect indexing with PAD_SLOT_ID padding). During CUDA graph capture, the batch is padded to decode_cudagraph_max_bs (= max_num_seqs), which can exceed the number of allocated mamba cache blocks when GPU memory is limited. The Triton kernel already handles this correctly via:

Early-return for PAD_SLOT_ID entries
Bounds-check masks (conv_states_input_coord < num_cache_lines) on all conv_state loads

The assertion is also redundant when conv_state_indices is None, as it duplicates the check at line 1156.

This PR removes the overly strict assertion and adds a regression test that exercises the num_cache_lines < padded_batch_size scenario with validate_data=True.

Test Plan

Added test_causal_conv1d_update_cache_lines_lt_batch (16 parametrized cases) that reproduces the CUDA graph capture scenario where total_entries (5) < padded_batch_size (8) with validate_data=True.

Test Result

tests/kernels/mamba/test_causal_conv1d.py::test_causal_conv1d_update_cache_lines_lt_batch — 16 passed
tests/kernels/mamba/test_causal_conv1d.py::test_causal_conv1d_update_with_batch_gather — 128 passed (no regression)

Changed files

tests/kernels/mamba/test_causal_conv1d.py (modified, +79/-0)
vllm/model_executor/layers/mamba/ops/causal_conv1d.py (modified, +0/-1)

RAW_BUFFERClick to expand / collapse

Your current environment

root@node37:/disk1/Qwen3.5-35B-A3B# more docker-compose.yml version: '3.3' services:

vllm

🐛 Describe the bug

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

The error message indicates an AssertionError in the causal_conv1d_update function, which suggests that the num_cache_lines is less than the batch size. To fix this issue, you can try the following steps:

Increase the num_cache_lines: You can increase the num_cache_lines by adjusting the kv_cache_config in the vllm configuration. You can do this by adding the following lines to your docker-compose.yml file:

command: >
  --model /models/Qwen3.5-35B-A3B
  --enable-auto-tool-choice
  --tool-call-parser hermes      
  --tokenizer_mode="auto"
  --dtype=bfloat16
  --tensor_parallel_size=2
  --gpu-memory-utilization=0.8
  --max-model-len=32768
  --served-model-name=Qwen3.5-35B-A3B
  --kv-cache-config=num_cache_lines=1024

Reduce the batch size: If increasing the num_cache_lines is not feasible, you can try reducing the batch size to match the num_cache_lines. You can do this by adding the following lines to your docker-compose.yml file:

command: >
  --model /models/Qwen3.5-35B-A3B
  --enable-auto-tool-choice
  --tool-call-parser hermes      
  --tokenizer_mode="auto"
  --dtype=bfloat16
  --tensor_parallel_size=2
  --gpu-memory-utilization=0.8
  --max-model-len=32768
  --served-model-name=Qwen3.5-35B-A3B
  --batch-size=32

Update the vllm configuration: Make sure that the vllm configuration is up-to-date and compatible with the latest version of the vllm library. You can check the vllm documentation for the latest configuration options and updates.

Verification

To verify that the fix worked, you can check the vllm logs for any error messages related to the AssertionError. You can also test the vllm model with a sample input to ensure that it is working correctly.

Extra Tips

Make sure to check the vllm documentation for any known issues or limitations related to the causal_conv1d_update function.
If you are still experiencing issues, you can try debugging the vllm code to identify the root cause of the problem.
Consider reaching out to the vllm community or support team for further assistance if needed.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #output truncation #response parsing #model loading #file not found

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - ✅(Solved) Fix [Bug]:Qwen3.5-35B-A3B vllm v0.17.0 ERROR 03-10 00:52:24 [multiproc_executor.py:261] Worker proc VllmWorker-0 died unexpectedly, shutting down executor. [1 pull requests, 5 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

vllm

Fix Action

Fix / Workaround

vllm

PR fix notes

PR #36639: [Bugfix] Fix causal_conv1d assertion crash during CUDA graph capture (#36566)

Description (problem / solution / changelog)

Purpose

Test Plan

Test Result

Changed files

Your current environment

vllm

🐛 Describe the bug

Before submitting a new issue...

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

TRENDING

vllm - ✅(Solved) Fix [Bug]:Qwen3.5-35B-A3B vllm v0.17.0 ERROR 03-10 00:52:24 [multiproc_executor.py:261] Worker proc VllmWorker-0 died unexpectedly, shutting down executor. [1 pull requests, 5 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

vllm

Fix Action

Fix / Workaround

vllm

PR fix notes

PR #36639: [Bugfix] Fix causal_conv1d assertion crash during CUDA graph capture (#36566)

Description (problem / solution / changelog)

Purpose

Test Plan

Test Result

Changed files

Your current environment

vllm

🐛 Describe the bug

Before submitting a new issue...

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING