vllm - 💡(How to fix) Fix [Bug]: Inference qwen3.5 with tensor-parallel-size>1, RuntimeError: NCCL error: unhandled system error [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#39774Fetched 2026-04-16 06:36:42
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Author
Participants
Timeline (top)
labeled ×1

Error Message

I also tried qwen3-32B with --tensor-parallel-size 4, same error. changed with vllm 0.16 to inference qwen3.5, same error. (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] WorkerProc failed to start. (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] Traceback (most recent call last): (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 826, in worker_main (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] worker = WorkerProc(*args, **kwargs) (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] return func(*args, **kwargs) (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^^^ (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 605, in init (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] self.worker.init_device() (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/worker/worker_base.py", line 312, in init_device (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] self.worker.init_device() # type: ignore (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] return func(*args, **kwargs) (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^^^ (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 263, in init_device (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] init_worker_distributed_environment( (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 1043, in init_worker_distributed_environment (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ensure_model_parallel_initialized( (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/parallel_state.py", line 1748, in ensure_model_parallel_initialized (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] initialize_model_parallel( (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/parallel_state.py", line 1575, in initialize_model_parallel (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] _TP = init_model_parallel_group( (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/parallel_state.py", line 1157, in init_model_parallel_group (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] return GroupCoordinator( (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^

Root Cause

The output as follows: (APIServer pid=1540417) INFO 04-14 13:46:36 [utils.py:299] (APIServer pid=1540417) INFO 04-14 13:46:36 [utils.py:299] █ █ █▄ ▄█ (APIServer pid=1540417) INFO 04-14 13:46:36 [utils.py:299] ▄▄ ▄█ █ █ █ ?▄? █ version 0.19.0 (APIServer pid=1540417) INFO 04-14 13:46:36 [utils.py:299] █▄█? █ █ █ █ model Qwen3.5-27B (APIServer pid=1540417) INFO 04-14 13:46:36 [utils.py:299] ?? ????? ????? ? ? (APIServer pid=1540417) INFO 04-14 13:46:36 [utils.py:299] (APIServer pid=1540417) INFO 04-14 13:46:36 [utils.py:233] non-default args: {'model_tag': 'Qwen3.5-27B', 'enable_auto_tool_choice': True, 'tool_call_parser': 'qwen3_coder', 'host': '0.0.0.0', 'port': 8010, 'model': 'Qwen3.5-27B', 'dtype': 'half', 'max_model_len': 20000, 'reasoning_parser': 'qwen3', 'tensor_parallel_size': 4, 'gpu_memory_utilization': 0.95} (APIServer pid=1540417) INFO 04-14 13:46:36 [model.py:549] Resolved architecture: Qwen3_5ForConditionalGeneration (APIServer pid=1540417) WARNING 04-14 13:46:36 [model.py:2016] Casting torch.bfloat16 to torch.float16. (APIServer pid=1540417) INFO 04-14 13:46:36 [model.py:1678] Using max model len 20000 (APIServer pid=1540417) INFO 04-14 13:46:36 [config.py:281] Setting attention block size to 784 tokens to ensure that attention page size is >= mamba page size. (APIServer pid=1540417) INFO 04-14 13:46:36 [config.py:312] Padding mamba page size by 0.13% to ensure that mamba page size and attention page size are exactly equal. (APIServer pid=1540417) INFO 04-14 13:46:36 [vllm.py:790] Asynchronous scheduling is enabled. (EngineCore pid=1540790) INFO 04-14 13:46:50 [core.py:105] Initializing a V1 LLM engine (v0.19.0) with config: model='Qwen3.5-27B', speculative_config=None, tokenizer='Qwen3.5-27B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=20000, download_dir=None, load_format=auto, tensor_parallel_size=4, pipeline_parallel_size=1, data_parallel_size=1, decode_context_parallel_size=1, dcp_comm_backend=ag_rs, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, enable_return_routed_experts=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='qwen3', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_metrics=False, kv_cache_metrics_sample=0.01, cudagraph_metrics=False, enable_layerwise_nvtx_tracing=False, enable_mfu_metrics=False, enable_mm_processor_stats=False, enable_logging_iteration_details=False), seed=0, served_model_name=Qwen3.5-27B, enable_prefix_caching=False, enable_chunked_prefill=True, pooler_config=None, compilation_config={'mode': <CompilationMode.VLLM_COMPILE: 3>, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['none'], 'splitting_ops': ['vllm::unified_attention', 'vllm::unified_attention_with_output', 'vllm::unified_mla_attention', 'vllm::unified_mla_attention_with_output', 'vllm::mamba_mixer2', 'vllm::mamba_mixer', 'vllm::short_conv', 'vllm::linear_attention', 'vllm::plamo2_mamba_mixer', 'vllm::gdn_attention_core', 'vllm::olmo_hybrid_gdn_full_forward', 'vllm::kda_attention', 'vllm::sparse_attn_indexer', 'vllm::rocm_aiter_sparse_attn_indexer', 'vllm::unified_kv_cache_update', 'vllm::unified_mla_kv_cache_update'], 'compile_mm_encoder': False, 'cudagraph_mm_encoder': False, 'encoder_cudagraph_token_budgets': [], 'encoder_cudagraph_max_images_per_batch': 0, 'compile_sizes': [], 'compile_ranges_endpoints': [2048], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'size_asserts': False, 'alignment_asserts': False, 'scalar_asserts': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.FULL_AND_PIECEWISE: (2, 1)>, 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256, 272, 288, 304, 320, 336, 352, 368, 384, 400, 416, 432, 448, 464, 480, 496, 512], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False}, 'max_cudagraph_capture_size': 512, 'dynamic_shapes_config': {'type': <DynamicShapesType.BACKED: 'backed'>, 'evaluate_guards': False, 'assume_32_bit_indexing': False}, 'local_cache_dir': None, 'fast_moe_cold_start': True, 'static_all_moe_layers': []} (EngineCore pid=1540790) WARNING 04-14 13:46:50 [multiproc_executor.py:1014] Reducing Torch parallelism from 32 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed. (EngineCore pid=1540790) INFO 04-14 13:46:50 [multiproc_executor.py:134] DP group leader: node_rank=0, node_rank_within_dp=0, master_addr=127.0.0.1, mq_connect_ip=10.21.177.199 (local), world_size=4, local_world_size=4 (Worker pid=1541058) INFO 04-14 13:46:58 [parallel_state.py:1400] world_size=4 rank=0 local_rank=0 distributed_init_method=tcp://127.0.0.1:54121 backend=nccl (Worker pid=1541060) INFO 04-14 13:46:59 [parallel_state.py:1400] world_size=4 rank=2 local_rank=2 distributed_init_method=tcp://127.0.0.1:54121 backend=nccl (Worker pid=1541061) INFO 04-14 13:46:59 [parallel_state.py:1400] world_size=4 rank=3 local_rank=3 distributed_init_method=tcp://127.0.0.1:54121 backend=nccl (Worker pid=1541059) INFO 04-14 13:46:59 [parallel_state.py:1400] world_size=4 rank=1 local_rank=1 distributed_init_method=tcp://127.0.0.1:54121 backend=nccl (Worker pid=1541061) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. (Worker pid=1541060) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. (Worker pid=1541060) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. (Worker pid=1541061) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. (Worker pid=1541058) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. (Worker pid=1541058) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. (Worker pid=1541059) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. (Worker pid=1541059) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. (Worker pid=1541058) INFO 04-14 13:46:59 [pynccl.py:111] vLLM is using nccl==2.27.5 (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] WorkerProc failed to start. (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] Traceback (most recent call last): (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 826, in worker_main (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] worker = WorkerProc(*args, **kwargs) (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] return func(*args, **kwargs) (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^^^ (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 605, in init (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] self.worker.init_device() (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/worker/worker_base.py", line 312, in init_device (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] self.worker.init_device() # type: ignore (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] return func(*args, **kwargs) (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^^^ (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 263, in init_device (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] init_worker_distributed_environment( (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 1043, in init_worker_distributed_environment (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ensure_model_parallel_initialized( (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/parallel_state.py", line 1748, in ensure_model_parallel_initialized (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] initialize_model_parallel( (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/parallel_state.py", line 1575, in initialize_model_parallel (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] _TP = init_model_parallel_group( (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/parallel_state.py", line 1157, in init_model_parallel_group (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] return GroupCoordinator( (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^ (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/parallel_state.py", line 376, in init (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] self.device_communicator = device_comm_cls( (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^ (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/device_communicators/cuda_communicator.py", line 75, in init (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] self.pynccl_comm = PyNcclCommunicator( (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^ (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/device_communicators/pynccl.py", line 144, in init (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] self.all_reduce(data) (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/device_communicators/pynccl.py", line 177, in all_reduce (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] self.nccl.ncclAllReduce( (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/device_communicators/pynccl_wrapper.py", line 429, in ncclAllReduce (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] self.NCCL_CHECK( (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/device_communicators/pynccl_wrapper.py", line 373, in NCCL_CHECK (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] raise RuntimeError(f"NCCL error: {error_str}") (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] RuntimeError: NCCL error: unhandled system error (run with NCCL_DEBUG=INFO for details) (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] WorkerProc failed to start. (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] Traceback (most recent call last): (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 826, in worker_main (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] worker = WorkerProc(*args, **kwargs) (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] return func(*args, **kwargs) (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^^^ (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 605, in init (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] self.worker.init_device() (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/worker/worker_base.py", line 312, in init_device (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] self.worker.init_device() # type: ignore (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] return func(*args, **kwargs) (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^^^ (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 263, in init_device (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] init_worker_distributed_environment( (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 1043, in init_worker_distributed_environment (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ensure_model_parallel_initialized( (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/parallel_state.py", line 1748, in ensure_model_parallel_initialized (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] initialize_model_parallel( (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/parallel_state.py", line 1575, in initialize_model_parallel (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] _TP = init_model_parallel_group( (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/parallel_state.py", line 1157, in init_model_parallel_group (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] return GroupCoordinator( (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^ (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/parallel_state.py", line 376, in init (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] self.device_communicator = device_comm_cls( (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^ (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/device_communicators/cuda_communicator.py", line 75, in init (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] self.pynccl_comm = PyNcclCommunicator( (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^ (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/device_communicators/pynccl.py", line 144, in init (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] self.all_reduce(data) (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/device_communicators/pynccl.py", line 177, in all_reduce (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] self.nccl.ncclAllReduce( (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/device_communicators/pynccl_wrapper.py", line 429, in ncclAllReduce (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] self.NCCL_CHECK( (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/device_communicators/pynccl_wrapper.py", line 373, in NCCL_CHECK (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] raise RuntimeError(f"NCCL error: {error_str}") (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] RuntimeError: NCCL error: unhandled system error (run with NCCL_DEBUG=INFO for details) (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] EngineCore failed to start. (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] Traceback (most recent call last): (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 1082, in run_engine_core (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs) (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] return func(*args, **kwargs) (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 848, in init (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] super().init( (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 114, in init (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] self.model_executor = executor_class(vllm_config) (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 101, in init (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] super().init(vllm_config) (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] return func(*args, **kwargs) (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 103, in init (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] self._init_executor() (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 190, in _init_executor (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] self.workers = WorkerProc.wait_for_ready(unready_workers) (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 736, in wait_for_ready (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] raise e from None (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause. (EngineCore pid=1540790) Process EngineCore: (EngineCore pid=1540790) Traceback (most recent call last): (EngineCore pid=1540790) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap (EngineCore pid=1540790) self.run() (EngineCore pid=1540790) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/multiprocessing/process.py", line 108, in run (EngineCore pid=1540790) self._target(*self._args, **self._kwargs) (EngineCore pid=1540790) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 1112, in run_engine_core (EngineCore pid=1540790) raise e (EngineCore pid=1540790) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 1082, in run_engine_core (EngineCore pid=1540790) engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs) (EngineCore pid=1540790) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=1540790) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=1540790) return func(*args, **kwargs) (EngineCore pid=1540790) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=1540790) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 848, in init (EngineCore pid=1540790) super().init( (EngineCore pid=1540790) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 114, in init (EngineCore pid=1540790) self.model_executor = executor_class(vllm_config) (EngineCore pid=1540790) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=1540790) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 101, in init (EngineCore pid=1540790) super().init(vllm_config) (EngineCore pid=1540790) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=1540790) return func(*args, **kwargs) (EngineCore pid=1540790) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=1540790) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 103, in init (EngineCore pid=1540790) self._init_executor() (EngineCore pid=1540790) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 190, in _init_executor (EngineCore pid=1540790) self.workers = WorkerProc.wait_for_ready(unready_workers) (EngineCore pid=1540790) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=1540790) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 736, in wait_for_ready (EngineCore pid=1540790) raise e from None (EngineCore pid=1540790) Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause. (APIServer pid=1540417) Traceback (most recent call last): (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/bin/vllm", line 6, in <module> (APIServer pid=1540417) sys.exit(main()) (APIServer pid=1540417) ^^^^^^ (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/entrypoints/cli/main.py", line 75, in main (APIServer pid=1540417) args.dispatch_function(args) (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/entrypoints/cli/serve.py", line 122, in cmd (APIServer pid=1540417) uvloop.run(run_server(args)) (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/uvloop/init.py", line 96, in run (APIServer pid=1540417) return __asyncio.run( (APIServer pid=1540417) ^^^^^^^^^^^^^^ (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/asyncio/runners.py", line 195, in run (APIServer pid=1540417) return runner.run(main) (APIServer pid=1540417) ^^^^^^^^^^^^^^^^ (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/asyncio/runners.py", line 118, in run (APIServer pid=1540417) return self._loop.run_until_complete(task) (APIServer pid=1540417) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1540417) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/uvloop/init.py", line 48, in wrapper (APIServer pid=1540417) return await main (APIServer pid=1540417) ^^^^^^^^^^ (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 670, in run_server (APIServer pid=1540417) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs) (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 684, in run_server_worker (APIServer pid=1540417) async with build_async_engine_client( (APIServer pid=1540417) ^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/contextlib.py", line 210, in aenter (APIServer pid=1540417) return await anext(self.gen) (APIServer pid=1540417) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 100, in build_async_engine_client (APIServer pid=1540417) async with build_async_engine_client_from_engine_args( (APIServer pid=1540417) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/contextlib.py", line 210, in aenter (APIServer pid=1540417) return await anext(self.gen) (APIServer pid=1540417) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 136, in build_async_engine_client_from_engine_args (APIServer pid=1540417) async_llm = AsyncLLM.from_vllm_config( (APIServer pid=1540417) ^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 225, in from_vllm_config (APIServer pid=1540417) return cls( (APIServer pid=1540417) ^^^^ (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 154, in init (APIServer pid=1540417) self.engine_core = EngineCoreClient.make_async_mp_client( (APIServer pid=1540417) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (APIServer pid=1540417) return func(*args, **kwargs) (APIServer pid=1540417) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 130, in make_async_mp_client (APIServer pid=1540417) return AsyncMPClient(*client_args) (APIServer pid=1540417) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (APIServer pid=1540417) return func(*args, **kwargs) (APIServer pid=1540417) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 887, in init (APIServer pid=1540417) super().init( (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 535, in init (APIServer pid=1540417) with launch_core_engines( (APIServer pid=1540417) ^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/contextlib.py", line 144, in exit (APIServer pid=1540417) next(self.gen) (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/engine/utils.py", line 998, in launch_core_engines (APIServer pid=1540417) wait_for_engine_startup( (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/engine/utils.py", line 1057, in wait_for_engine_startup (APIServer pid=1540417) raise RuntimeError( (APIServer pid=1540417) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

Fix Action

Fix / Workaround

The output as follows: (APIServer pid=1540417) INFO 04-14 13:46:36 [utils.py:299] (APIServer pid=1540417) INFO 04-14 13:46:36 [utils.py:299] █ █ █▄ ▄█ (APIServer pid=1540417) INFO 04-14 13:46:36 [utils.py:299] ▄▄ ▄█ █ █ █ ?▄? █ version 0.19.0 (APIServer pid=1540417) INFO 04-14 13:46:36 [utils.py:299] █▄█? █ █ █ █ model Qwen3.5-27B (APIServer pid=1540417) INFO 04-14 13:46:36 [utils.py:299] ?? ????? ????? ? ? (APIServer pid=1540417) INFO 04-14 13:46:36 [utils.py:299] (APIServer pid=1540417) INFO 04-14 13:46:36 [utils.py:233] non-default args: {'model_tag': 'Qwen3.5-27B', 'enable_auto_tool_choice': True, 'tool_call_parser': 'qwen3_coder', 'host': '0.0.0.0', 'port': 8010, 'model': 'Qwen3.5-27B', 'dtype': 'half', 'max_model_len': 20000, 'reasoning_parser': 'qwen3', 'tensor_parallel_size': 4, 'gpu_memory_utilization': 0.95} (APIServer pid=1540417) INFO 04-14 13:46:36 [model.py:549] Resolved architecture: Qwen3_5ForConditionalGeneration (APIServer pid=1540417) WARNING 04-14 13:46:36 [model.py:2016] Casting torch.bfloat16 to torch.float16. (APIServer pid=1540417) INFO 04-14 13:46:36 [model.py:1678] Using max model len 20000 (APIServer pid=1540417) INFO 04-14 13:46:36 [config.py:281] Setting attention block size to 784 tokens to ensure that attention page size is >= mamba page size. (APIServer pid=1540417) INFO 04-14 13:46:36 [config.py:312] Padding mamba page size by 0.13% to ensure that mamba page size and attention page size are exactly equal. (APIServer pid=1540417) INFO 04-14 13:46:36 [vllm.py:790] Asynchronous scheduling is enabled. (EngineCore pid=1540790) INFO 04-14 13:46:50 [core.py:105] Initializing a V1 LLM engine (v0.19.0) with config: model='Qwen3.5-27B', speculative_config=None, tokenizer='Qwen3.5-27B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=20000, download_dir=None, load_format=auto, tensor_parallel_size=4, pipeline_parallel_size=1, data_parallel_size=1, decode_context_parallel_size=1, dcp_comm_backend=ag_rs, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, enable_return_routed_experts=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='qwen3', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_metrics=False, kv_cache_metrics_sample=0.01, cudagraph_metrics=False, enable_layerwise_nvtx_tracing=False, enable_mfu_metrics=False, enable_mm_processor_stats=False, enable_logging_iteration_details=False), seed=0, served_model_name=Qwen3.5-27B, enable_prefix_caching=False, enable_chunked_prefill=True, pooler_config=None, compilation_config={'mode': <CompilationMode.VLLM_COMPILE: 3>, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['none'], 'splitting_ops': ['vllm::unified_attention', 'vllm::unified_attention_with_output', 'vllm::unified_mla_attention', 'vllm::unified_mla_attention_with_output', 'vllm::mamba_mixer2', 'vllm::mamba_mixer', 'vllm::short_conv', 'vllm::linear_attention', 'vllm::plamo2_mamba_mixer', 'vllm::gdn_attention_core', 'vllm::olmo_hybrid_gdn_full_forward', 'vllm::kda_attention', 'vllm::sparse_attn_indexer', 'vllm::rocm_aiter_sparse_attn_indexer', 'vllm::unified_kv_cache_update', 'vllm::unified_mla_kv_cache_update'], 'compile_mm_encoder': False, 'cudagraph_mm_encoder': False, 'encoder_cudagraph_token_budgets': [], 'encoder_cudagraph_max_images_per_batch': 0, 'compile_sizes': [], 'compile_ranges_endpoints': [2048], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'size_asserts': False, 'alignment_asserts': False, 'scalar_asserts': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.FULL_AND_PIECEWISE: (2, 1)>, 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256, 272, 288, 304, 320, 336, 352, 368, 384, 400, 416, 432, 448, 464, 480, 496, 512], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False}, 'max_cudagraph_capture_size': 512, 'dynamic_shapes_config': {'type': <DynamicShapesType.BACKED: 'backed'>, 'evaluate_guards': False, 'assume_32_bit_indexing': False}, 'local_cache_dir': None, 'fast_moe_cold_start': True, 'static_all_moe_layers': []} (EngineCore pid=1540790) WARNING 04-14 13:46:50 [multiproc_executor.py:1014] Reducing Torch parallelism from 32 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed. (EngineCore pid=1540790) INFO 04-14 13:46:50 [multiproc_executor.py:134] DP group leader: node_rank=0, node_rank_within_dp=0, master_addr=127.0.0.1, mq_connect_ip=10.21.177.199 (local), world_size=4, local_world_size=4 (Worker pid=1541058) INFO 04-14 13:46:58 [parallel_state.py:1400] world_size=4 rank=0 local_rank=0 distributed_init_method=tcp://127.0.0.1:54121 backend=nccl (Worker pid=1541060) INFO 04-14 13:46:59 [parallel_state.py:1400] world_size=4 rank=2 local_rank=2 distributed_init_method=tcp://127.0.0.1:54121 backend=nccl (Worker pid=1541061) INFO 04-14 13:46:59 [parallel_state.py:1400] world_size=4 rank=3 local_rank=3 distributed_init_method=tcp://127.0.0.1:54121 backend=nccl (Worker pid=1541059) INFO 04-14 13:46:59 [parallel_state.py:1400] world_size=4 rank=1 local_rank=1 distributed_init_method=tcp://127.0.0.1:54121 backend=nccl (Worker pid=1541061) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. (Worker pid=1541060) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. (Worker pid=1541060) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. (Worker pid=1541061) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. (Worker pid=1541058) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. (Worker pid=1541058) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. (Worker pid=1541059) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. (Worker pid=1541059) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. (Worker pid=1541058) INFO 04-14 13:46:59 [pynccl.py:111] vLLM is using nccl==2.27.5 (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] WorkerProc failed to start. (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] Traceback (most recent call last): (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 826, in worker_main (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] worker = WorkerProc(*args, **kwargs) (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] return func(*args, **kwargs) (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^^^ (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 605, in init (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] self.worker.init_device() (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/worker/worker_base.py", line 312, in init_device (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] self.worker.init_device() # type: ignore (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] return func(*args, **kwargs) (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^^^ (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 263, in init_device (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] init_worker_distributed_environment( (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 1043, in init_worker_distributed_environment (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ensure_model_parallel_initialized( (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/parallel_state.py", line 1748, in ensure_model_parallel_initialized (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] initialize_model_parallel( (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/parallel_state.py", line 1575, in initialize_model_parallel (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] _TP = init_model_parallel_group( (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/parallel_state.py", line 1157, in init_model_parallel_group (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] return GroupCoordinator( (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^ (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/parallel_state.py", line 376, in init (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] self.device_communicator = device_comm_cls( (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^ (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/device_communicators/cuda_communicator.py", line 75, in init (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] self.pynccl_comm = PyNcclCommunicator( (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^ (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/device_communicators/pynccl.py", line 144, in init (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] self.all_reduce(data) (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/device_communicators/pynccl.py", line 177, in all_reduce (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] self.nccl.ncclAllReduce( (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/device_communicators/pynccl_wrapper.py", line 429, in ncclAllReduce (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] self.NCCL_CHECK( (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/device_communicators/pynccl_wrapper.py", line 373, in NCCL_CHECK (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] raise RuntimeError(f"NCCL error: {error_str}") (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] RuntimeError: NCCL error: unhandled system error (run with NCCL_DEBUG=INFO for details) (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] WorkerProc failed to start. (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] Traceback (most recent call last): (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 826, in worker_main (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] worker = WorkerProc(*args, **kwargs) (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] return func(*args, **kwargs) (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^^^ (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 605, in init (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] self.worker.init_device() (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/worker/worker_base.py", line 312, in init_device (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] self.worker.init_device() # type: ignore (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] return func(*args, **kwargs) (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^^^ (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 263, in init_device (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] init_worker_distributed_environment( (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 1043, in init_worker_distributed_environment (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ensure_model_parallel_initialized( (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/parallel_state.py", line 1748, in ensure_model_parallel_initialized (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] initialize_model_parallel( (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/parallel_state.py", line 1575, in initialize_model_parallel (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] _TP = init_model_parallel_group( (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/parallel_state.py", line 1157, in init_model_parallel_group (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] return GroupCoordinator( (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^ (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/parallel_state.py", line 376, in init (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] self.device_communicator = device_comm_cls( (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^ (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/device_communicators/cuda_communicator.py", line 75, in init (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] self.pynccl_comm = PyNcclCommunicator( (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^ (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/device_communicators/pynccl.py", line 144, in init (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] self.all_reduce(data) (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/device_communicators/pynccl.py", line 177, in all_reduce (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] self.nccl.ncclAllReduce( (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/device_communicators/pynccl_wrapper.py", line 429, in ncclAllReduce (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] self.NCCL_CHECK( (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/device_communicators/pynccl_wrapper.py", line 373, in NCCL_CHECK (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] raise RuntimeError(f"NCCL error: {error_str}") (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] RuntimeError: NCCL error: unhandled system error (run with NCCL_DEBUG=INFO for details) (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] EngineCore failed to start. (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] Traceback (most recent call last): (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 1082, in run_engine_core (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs) (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] return func(*args, **kwargs) (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 848, in init (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] super().init( (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 114, in init (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] self.model_executor = executor_class(vllm_config) (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 101, in init (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] super().init(vllm_config) (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] return func(*args, **kwargs) (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 103, in init (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] self._init_executor() (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 190, in _init_executor (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] self.workers = WorkerProc.wait_for_ready(unready_workers) (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 736, in wait_for_ready (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] raise e from None (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause. (EngineCore pid=1540790) Process EngineCore: (EngineCore pid=1540790) Traceback (most recent call last): (EngineCore pid=1540790) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap (EngineCore pid=1540790) self.run() (EngineCore pid=1540790) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/multiprocessing/process.py", line 108, in run (EngineCore pid=1540790) self._target(*self._args, **self._kwargs) (EngineCore pid=1540790) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 1112, in run_engine_core (EngineCore pid=1540790) raise e (EngineCore pid=1540790) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 1082, in run_engine_core (EngineCore pid=1540790) engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs) (EngineCore pid=1540790) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=1540790) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=1540790) return func(*args, **kwargs) (EngineCore pid=1540790) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=1540790) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 848, in init (EngineCore pid=1540790) super().init( (EngineCore pid=1540790) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 114, in init (EngineCore pid=1540790) self.model_executor = executor_class(vllm_config) (EngineCore pid=1540790) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=1540790) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 101, in init (EngineCore pid=1540790) super().init(vllm_config) (EngineCore pid=1540790) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=1540790) return func(*args, **kwargs) (EngineCore pid=1540790) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=1540790) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 103, in init (EngineCore pid=1540790) self._init_executor() (EngineCore pid=1540790) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 190, in _init_executor (EngineCore pid=1540790) self.workers = WorkerProc.wait_for_ready(unready_workers) (EngineCore pid=1540790) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=1540790) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 736, in wait_for_ready (EngineCore pid=1540790) raise e from None (EngineCore pid=1540790) Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause. (APIServer pid=1540417) Traceback (most recent call last): (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/bin/vllm", line 6, in <module> (APIServer pid=1540417) sys.exit(main()) (APIServer pid=1540417) ^^^^^^ (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/entrypoints/cli/main.py", line 75, in main (APIServer pid=1540417) args.dispatch_function(args) (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/entrypoints/cli/serve.py", line 122, in cmd (APIServer pid=1540417) uvloop.run(run_server(args)) (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/uvloop/init.py", line 96, in run (APIServer pid=1540417) return __asyncio.run( (APIServer pid=1540417) ^^^^^^^^^^^^^^ (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/asyncio/runners.py", line 195, in run (APIServer pid=1540417) return runner.run(main) (APIServer pid=1540417) ^^^^^^^^^^^^^^^^ (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/asyncio/runners.py", line 118, in run (APIServer pid=1540417) return self._loop.run_until_complete(task) (APIServer pid=1540417) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1540417) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/uvloop/init.py", line 48, in wrapper (APIServer pid=1540417) return await main (APIServer pid=1540417) ^^^^^^^^^^ (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 670, in run_server (APIServer pid=1540417) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs) (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 684, in run_server_worker (APIServer pid=1540417) async with build_async_engine_client( (APIServer pid=1540417) ^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/contextlib.py", line 210, in aenter (APIServer pid=1540417) return await anext(self.gen) (APIServer pid=1540417) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 100, in build_async_engine_client (APIServer pid=1540417) async with build_async_engine_client_from_engine_args( (APIServer pid=1540417) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/contextlib.py", line 210, in aenter (APIServer pid=1540417) return await anext(self.gen) (APIServer pid=1540417) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 136, in build_async_engine_client_from_engine_args (APIServer pid=1540417) async_llm = AsyncLLM.from_vllm_config( (APIServer pid=1540417) ^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 225, in from_vllm_config (APIServer pid=1540417) return cls( (APIServer pid=1540417) ^^^^ (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 154, in init (APIServer pid=1540417) self.engine_core = EngineCoreClient.make_async_mp_client( (APIServer pid=1540417) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (APIServer pid=1540417) return func(*args, **kwargs) (APIServer pid=1540417) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 130, in make_async_mp_client (APIServer pid=1540417) return AsyncMPClient(*client_args) (APIServer pid=1540417) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (APIServer pid=1540417) return func(*args, **kwargs) (APIServer pid=1540417) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 887, in init (APIServer pid=1540417) super().init( (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 535, in init (APIServer pid=1540417) with launch_core_engines( (APIServer pid=1540417) ^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/contextlib.py", line 144, in exit (APIServer pid=1540417) next(self.gen) (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/engine/utils.py", line 998, in launch_core_engines (APIServer pid=1540417) wait_for_engine_startup( (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/engine/utils.py", line 1057, in wait_for_engine_startup (APIServer pid=1540417) raise RuntimeError( (APIServer pid=1540417) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

Code Example

Your output of `python collect_env.py` here
RAW_BUFFERClick to expand / collapse

Your current environment

<details> <summary>The output of <code>python collect_env.py</code></summary>
Your output of `python collect_env.py` here
</details>

🐛 Describe the bug

I use vllm 0.19 to inference qwen3.5 on 4*V100, the command as follows: CUDA_VISIBLE_DEVICES=0,1,2,3 vllm serve Qwen3.5-27B --max-model-len 20000 --gpu-memory-utilization 0.95 --host 0.0.0.0 --port 8010 --dtype half --reasoning-parser qwen3 --tensor-parallel-size 4 --enable-auto-tool-choice --tool-call-parser qwen3_coder

I also tried qwen3-32B with --tensor-parallel-size 4, same error. qwen3-8B with -tensor-parallel-size 1, worked. changed with vllm 0.16 to inference qwen3.5, same error.

The output as follows: (APIServer pid=1540417) INFO 04-14 13:46:36 [utils.py:299] (APIServer pid=1540417) INFO 04-14 13:46:36 [utils.py:299] █ █ █▄ ▄█ (APIServer pid=1540417) INFO 04-14 13:46:36 [utils.py:299] ▄▄ ▄█ █ █ █ ?▄? █ version 0.19.0 (APIServer pid=1540417) INFO 04-14 13:46:36 [utils.py:299] █▄█? █ █ █ █ model Qwen3.5-27B (APIServer pid=1540417) INFO 04-14 13:46:36 [utils.py:299] ?? ????? ????? ? ? (APIServer pid=1540417) INFO 04-14 13:46:36 [utils.py:299] (APIServer pid=1540417) INFO 04-14 13:46:36 [utils.py:233] non-default args: {'model_tag': 'Qwen3.5-27B', 'enable_auto_tool_choice': True, 'tool_call_parser': 'qwen3_coder', 'host': '0.0.0.0', 'port': 8010, 'model': 'Qwen3.5-27B', 'dtype': 'half', 'max_model_len': 20000, 'reasoning_parser': 'qwen3', 'tensor_parallel_size': 4, 'gpu_memory_utilization': 0.95} (APIServer pid=1540417) INFO 04-14 13:46:36 [model.py:549] Resolved architecture: Qwen3_5ForConditionalGeneration (APIServer pid=1540417) WARNING 04-14 13:46:36 [model.py:2016] Casting torch.bfloat16 to torch.float16. (APIServer pid=1540417) INFO 04-14 13:46:36 [model.py:1678] Using max model len 20000 (APIServer pid=1540417) INFO 04-14 13:46:36 [config.py:281] Setting attention block size to 784 tokens to ensure that attention page size is >= mamba page size. (APIServer pid=1540417) INFO 04-14 13:46:36 [config.py:312] Padding mamba page size by 0.13% to ensure that mamba page size and attention page size are exactly equal. (APIServer pid=1540417) INFO 04-14 13:46:36 [vllm.py:790] Asynchronous scheduling is enabled. (EngineCore pid=1540790) INFO 04-14 13:46:50 [core.py:105] Initializing a V1 LLM engine (v0.19.0) with config: model='Qwen3.5-27B', speculative_config=None, tokenizer='Qwen3.5-27B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=20000, download_dir=None, load_format=auto, tensor_parallel_size=4, pipeline_parallel_size=1, data_parallel_size=1, decode_context_parallel_size=1, dcp_comm_backend=ag_rs, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, enable_return_routed_experts=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='qwen3', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_metrics=False, kv_cache_metrics_sample=0.01, cudagraph_metrics=False, enable_layerwise_nvtx_tracing=False, enable_mfu_metrics=False, enable_mm_processor_stats=False, enable_logging_iteration_details=False), seed=0, served_model_name=Qwen3.5-27B, enable_prefix_caching=False, enable_chunked_prefill=True, pooler_config=None, compilation_config={'mode': <CompilationMode.VLLM_COMPILE: 3>, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['none'], 'splitting_ops': ['vllm::unified_attention', 'vllm::unified_attention_with_output', 'vllm::unified_mla_attention', 'vllm::unified_mla_attention_with_output', 'vllm::mamba_mixer2', 'vllm::mamba_mixer', 'vllm::short_conv', 'vllm::linear_attention', 'vllm::plamo2_mamba_mixer', 'vllm::gdn_attention_core', 'vllm::olmo_hybrid_gdn_full_forward', 'vllm::kda_attention', 'vllm::sparse_attn_indexer', 'vllm::rocm_aiter_sparse_attn_indexer', 'vllm::unified_kv_cache_update', 'vllm::unified_mla_kv_cache_update'], 'compile_mm_encoder': False, 'cudagraph_mm_encoder': False, 'encoder_cudagraph_token_budgets': [], 'encoder_cudagraph_max_images_per_batch': 0, 'compile_sizes': [], 'compile_ranges_endpoints': [2048], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'size_asserts': False, 'alignment_asserts': False, 'scalar_asserts': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.FULL_AND_PIECEWISE: (2, 1)>, 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256, 272, 288, 304, 320, 336, 352, 368, 384, 400, 416, 432, 448, 464, 480, 496, 512], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False}, 'max_cudagraph_capture_size': 512, 'dynamic_shapes_config': {'type': <DynamicShapesType.BACKED: 'backed'>, 'evaluate_guards': False, 'assume_32_bit_indexing': False}, 'local_cache_dir': None, 'fast_moe_cold_start': True, 'static_all_moe_layers': []} (EngineCore pid=1540790) WARNING 04-14 13:46:50 [multiproc_executor.py:1014] Reducing Torch parallelism from 32 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed. (EngineCore pid=1540790) INFO 04-14 13:46:50 [multiproc_executor.py:134] DP group leader: node_rank=0, node_rank_within_dp=0, master_addr=127.0.0.1, mq_connect_ip=10.21.177.199 (local), world_size=4, local_world_size=4 (Worker pid=1541058) INFO 04-14 13:46:58 [parallel_state.py:1400] world_size=4 rank=0 local_rank=0 distributed_init_method=tcp://127.0.0.1:54121 backend=nccl (Worker pid=1541060) INFO 04-14 13:46:59 [parallel_state.py:1400] world_size=4 rank=2 local_rank=2 distributed_init_method=tcp://127.0.0.1:54121 backend=nccl (Worker pid=1541061) INFO 04-14 13:46:59 [parallel_state.py:1400] world_size=4 rank=3 local_rank=3 distributed_init_method=tcp://127.0.0.1:54121 backend=nccl (Worker pid=1541059) INFO 04-14 13:46:59 [parallel_state.py:1400] world_size=4 rank=1 local_rank=1 distributed_init_method=tcp://127.0.0.1:54121 backend=nccl (Worker pid=1541061) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. (Worker pid=1541060) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. (Worker pid=1541060) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. (Worker pid=1541061) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. (Worker pid=1541058) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. (Worker pid=1541058) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. (Worker pid=1541059) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. (Worker pid=1541059) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. (Worker pid=1541058) INFO 04-14 13:46:59 [pynccl.py:111] vLLM is using nccl==2.27.5 (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] WorkerProc failed to start. (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] Traceback (most recent call last): (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 826, in worker_main (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] worker = WorkerProc(*args, **kwargs) (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] return func(*args, **kwargs) (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^^^ (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 605, in init (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] self.worker.init_device() (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/worker/worker_base.py", line 312, in init_device (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] self.worker.init_device() # type: ignore (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] return func(*args, **kwargs) (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^^^ (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 263, in init_device (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] init_worker_distributed_environment( (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 1043, in init_worker_distributed_environment (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ensure_model_parallel_initialized( (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/parallel_state.py", line 1748, in ensure_model_parallel_initialized (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] initialize_model_parallel( (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/parallel_state.py", line 1575, in initialize_model_parallel (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] _TP = init_model_parallel_group( (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/parallel_state.py", line 1157, in init_model_parallel_group (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] return GroupCoordinator( (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^ (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/parallel_state.py", line 376, in init (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] self.device_communicator = device_comm_cls( (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^ (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/device_communicators/cuda_communicator.py", line 75, in init (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] self.pynccl_comm = PyNcclCommunicator( (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^ (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/device_communicators/pynccl.py", line 144, in init (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] self.all_reduce(data) (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/device_communicators/pynccl.py", line 177, in all_reduce (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] self.nccl.ncclAllReduce( (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/device_communicators/pynccl_wrapper.py", line 429, in ncclAllReduce (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] self.NCCL_CHECK( (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/device_communicators/pynccl_wrapper.py", line 373, in NCCL_CHECK (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] raise RuntimeError(f"NCCL error: {error_str}") (Worker pid=1541060) ERROR 04-14 13:46:59 [multiproc_executor.py:857] RuntimeError: NCCL error: unhandled system error (run with NCCL_DEBUG=INFO for details) (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] WorkerProc failed to start. (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] Traceback (most recent call last): (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 826, in worker_main (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] worker = WorkerProc(*args, **kwargs) (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] return func(*args, **kwargs) (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^^^ (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 605, in init (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] self.worker.init_device() (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/worker/worker_base.py", line 312, in init_device (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] self.worker.init_device() # type: ignore (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] return func(*args, **kwargs) (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^^^ (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 263, in init_device (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] init_worker_distributed_environment( (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 1043, in init_worker_distributed_environment (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ensure_model_parallel_initialized( (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/parallel_state.py", line 1748, in ensure_model_parallel_initialized (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] initialize_model_parallel( (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/parallel_state.py", line 1575, in initialize_model_parallel (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] _TP = init_model_parallel_group( (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/parallel_state.py", line 1157, in init_model_parallel_group (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] return GroupCoordinator( (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^ (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/parallel_state.py", line 376, in init (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] self.device_communicator = device_comm_cls( (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^ (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/device_communicators/cuda_communicator.py", line 75, in init (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] self.pynccl_comm = PyNcclCommunicator( (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] ^^^^^^^^^^^^^^^^^^^ (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/device_communicators/pynccl.py", line 144, in init (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] self.all_reduce(data) (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/device_communicators/pynccl.py", line 177, in all_reduce (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] self.nccl.ncclAllReduce( (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/device_communicators/pynccl_wrapper.py", line 429, in ncclAllReduce (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] self.NCCL_CHECK( (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/distributed/device_communicators/pynccl_wrapper.py", line 373, in NCCL_CHECK (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] raise RuntimeError(f"NCCL error: {error_str}") (Worker pid=1541061) ERROR 04-14 13:46:59 [multiproc_executor.py:857] RuntimeError: NCCL error: unhandled system error (run with NCCL_DEBUG=INFO for details) (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] EngineCore failed to start. (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] Traceback (most recent call last): (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 1082, in run_engine_core (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs) (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] return func(*args, **kwargs) (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 848, in init (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] super().init( (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 114, in init (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] self.model_executor = executor_class(vllm_config) (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 101, in init (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] super().init(vllm_config) (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] return func(*args, **kwargs) (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 103, in init (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] self._init_executor() (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 190, in _init_executor (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] self.workers = WorkerProc.wait_for_ready(unready_workers) (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 736, in wait_for_ready (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] raise e from None (EngineCore pid=1540790) ERROR 04-14 13:47:07 [core.py:1108] Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause. (EngineCore pid=1540790) Process EngineCore: (EngineCore pid=1540790) Traceback (most recent call last): (EngineCore pid=1540790) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap (EngineCore pid=1540790) self.run() (EngineCore pid=1540790) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/multiprocessing/process.py", line 108, in run (EngineCore pid=1540790) self._target(*self._args, **self._kwargs) (EngineCore pid=1540790) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 1112, in run_engine_core (EngineCore pid=1540790) raise e (EngineCore pid=1540790) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 1082, in run_engine_core (EngineCore pid=1540790) engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs) (EngineCore pid=1540790) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=1540790) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=1540790) return func(*args, **kwargs) (EngineCore pid=1540790) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=1540790) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 848, in init (EngineCore pid=1540790) super().init( (EngineCore pid=1540790) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 114, in init (EngineCore pid=1540790) self.model_executor = executor_class(vllm_config) (EngineCore pid=1540790) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=1540790) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 101, in init (EngineCore pid=1540790) super().init(vllm_config) (EngineCore pid=1540790) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=1540790) return func(*args, **kwargs) (EngineCore pid=1540790) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=1540790) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 103, in init (EngineCore pid=1540790) self._init_executor() (EngineCore pid=1540790) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 190, in _init_executor (EngineCore pid=1540790) self.workers = WorkerProc.wait_for_ready(unready_workers) (EngineCore pid=1540790) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=1540790) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 736, in wait_for_ready (EngineCore pid=1540790) raise e from None (EngineCore pid=1540790) Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause. (APIServer pid=1540417) Traceback (most recent call last): (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/bin/vllm", line 6, in <module> (APIServer pid=1540417) sys.exit(main()) (APIServer pid=1540417) ^^^^^^ (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/entrypoints/cli/main.py", line 75, in main (APIServer pid=1540417) args.dispatch_function(args) (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/entrypoints/cli/serve.py", line 122, in cmd (APIServer pid=1540417) uvloop.run(run_server(args)) (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/uvloop/init.py", line 96, in run (APIServer pid=1540417) return __asyncio.run( (APIServer pid=1540417) ^^^^^^^^^^^^^^ (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/asyncio/runners.py", line 195, in run (APIServer pid=1540417) return runner.run(main) (APIServer pid=1540417) ^^^^^^^^^^^^^^^^ (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/asyncio/runners.py", line 118, in run (APIServer pid=1540417) return self._loop.run_until_complete(task) (APIServer pid=1540417) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1540417) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/uvloop/init.py", line 48, in wrapper (APIServer pid=1540417) return await main (APIServer pid=1540417) ^^^^^^^^^^ (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 670, in run_server (APIServer pid=1540417) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs) (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 684, in run_server_worker (APIServer pid=1540417) async with build_async_engine_client( (APIServer pid=1540417) ^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/contextlib.py", line 210, in aenter (APIServer pid=1540417) return await anext(self.gen) (APIServer pid=1540417) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 100, in build_async_engine_client (APIServer pid=1540417) async with build_async_engine_client_from_engine_args( (APIServer pid=1540417) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/contextlib.py", line 210, in aenter (APIServer pid=1540417) return await anext(self.gen) (APIServer pid=1540417) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 136, in build_async_engine_client_from_engine_args (APIServer pid=1540417) async_llm = AsyncLLM.from_vllm_config( (APIServer pid=1540417) ^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 225, in from_vllm_config (APIServer pid=1540417) return cls( (APIServer pid=1540417) ^^^^ (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 154, in init (APIServer pid=1540417) self.engine_core = EngineCoreClient.make_async_mp_client( (APIServer pid=1540417) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (APIServer pid=1540417) return func(*args, **kwargs) (APIServer pid=1540417) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 130, in make_async_mp_client (APIServer pid=1540417) return AsyncMPClient(*client_args) (APIServer pid=1540417) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (APIServer pid=1540417) return func(*args, **kwargs) (APIServer pid=1540417) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 887, in init (APIServer pid=1540417) super().init( (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 535, in init (APIServer pid=1540417) with launch_core_engines( (APIServer pid=1540417) ^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/contextlib.py", line 144, in exit (APIServer pid=1540417) next(self.gen) (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/engine/utils.py", line 998, in launch_core_engines (APIServer pid=1540417) wait_for_engine_startup( (APIServer pid=1540417) File "/home/ubuntu/miniconda3/envs/vllm-0.19/lib/python3.12/site-packages/vllm/v1/engine/utils.py", line 1057, in wait_for_engine_startup (APIServer pid=1540417) raise RuntimeError( (APIServer pid=1540417) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

The issue is likely due to an NCCL error, which may be caused by incorrect configuration or environment setup, and can be potentially resolved by setting the NCCL_DEBUG environment variable to INFO for more detailed error messages.

Guidance

  1. Set NCCL_DEBUG to INFO: Run the command with NCCL_DEBUG=INFO to get more detailed error messages, which may help identify the root cause of the issue.
  2. Check GPU configuration: Verify that the GPU devices are properly configured and recognized by the system, and that the CUDA_VISIBLE_DEVICES environment variable is set correctly.
  3. Verify NCCL version: Ensure that the NCCL version is compatible with the CUDA version and the system configuration.
  4. Check for environment conflicts: Verify that there are no conflicts between the environment variables and the configuration settings used by the application.

Example

To set the NCCL_DEBUG environment variable, you can run the command as follows:

NCCL_DEBUG=INFO CUDA_VISIBLE_DEVICES=0,1,2,3 vllm serve Qwen3.5-27B --max-model-len 20000 --gpu-memory-utilization 0.95 --host 0.0.0.0 --port 8010 --dtype half --reasoning-parser qwen3 --tensor-parallel-size 4 --enable-auto-tool-choice --tool-call-parser qwen3_coder

Notes

The provided error message suggests that there is an issue with the NCCL configuration or the environment setup. Setting the NCCL_DEBUG environment variable to INFO may provide more detailed error messages that can help identify the root cause of the issue.

Recommendation

Apply the workaround by setting NCCL_DEBUG=INFO to get more detailed error messages, and then investigate the root cause of the issue based on the provided error messages.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - 💡(How to fix) Fix [Bug]: Inference qwen3.5 with tensor-parallel-size>1, RuntimeError: NCCL error: unhandled system error [1 participants]