vllm - 💡(How to fix) Fix [Bug]: Based on vllm 0.18.0 version, when the number of tensor parallelizations is greater than 1, an error message will be reported: [AMP ERROR] [CudaFrontend. cpp: 94] [failed to call cuCtxGetDevice (&device), error code: CUDA-ERROR-INVALIDFHIR TEXT [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#38228Fetched 2026-04-08 01:37:14
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Author
Participants
Timeline (top)
labeled ×1

Error Message

root@notebook-tianhangyao-ythvllmtest-prd-pre:/mnt/workspace# vllm serve /mnt/workspace/qwen35_35B_A3B --port 8000 --host 0.0.0.0 --tensor-parallel-size 2 (APIServer pid=3842) INFO 03-26 11:55:55 [utils.py:297] (APIServer pid=3842) INFO 03-26 11:55:55 [utils.py:297] █ █ █▄ ▄█ (APIServer pid=3842) INFO 03-26 11:55:55 [utils.py:297] ▄▄ ▄█ █ █ █ ▀▄▀ █ version 0.18.0 (APIServer pid=3842) INFO 03-26 11:55:55 [utils.py:297] █▄█▀ █ █ █ █ model /mnt/workspace/qwen35_35B_A3B (APIServer pid=3842) INFO 03-26 11:55:55 [utils.py:297] ▀▀ ▀▀▀▀▀ ▀▀▀▀▀ ▀ ▀ (APIServer pid=3842) INFO 03-26 11:55:55 [utils.py:297] (APIServer pid=3842) INFO 03-26 11:55:55 [utils.py:233] non-default args: {'model_tag': '/mnt/workspace/qwen35_35B_A3B', 'host': '0.0.0.0', 'model': '/mnt/workspace/qwen35_35B_A3B', 'tensor_parallel_size': 2} (APIServer pid=3842) INFO 03-26 11:55:56 [model.py:533] Resolved architecture: Qwen3_5MoeForConditionalGeneration (APIServer pid=3842) INFO 03-26 11:55:56 [model.py:1582] Using max model len 262144 (APIServer pid=3842) INFO 03-26 11:55:56 [scheduler.py:231] Chunked prefill is enabled with max_num_batched_tokens=8192. (APIServer pid=3842) INFO 03-26 11:55:57 [config.py:212] Setting attention block size to 1056 tokens to ensure that attention page size is >= mamba page size. (APIServer pid=3842) INFO 03-26 11:55:57 [config.py:243] Padding mamba page size by 0.76% to ensure that mamba page size and attention page size are exactly equal. (APIServer pid=3842) INFO 03-26 11:55:57 [vllm.py:754] Asynchronous scheduling is enabled. (APIServer pid=3842) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. (APIServer pid=3842) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. (APIServer pid=3842) INFO 03-26 11:55:57 [compilation.py:289] Enabled custom fusions: allreduce_rms h h h h h h h h h h h h h h h h h h hhhhhhhhhhhhhh(EngineCore pid=4128) INFO 03-26 11:56:10 [core.py:103] Initializing a V1 LLM engine (v0.18.0) with config: model='/mnt/workspace/qwen35_35B_A3B', speculative_config=None, tokenizer='/mnt/workspace/qwen35_35B_A3B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=262144, download_dir=None, load_format=auto, tensor_parallel_size=2, pipeline_parallel_size=1, data_parallel_size=1, decode_context_parallel_size=1, dcp_comm_backend=ag_rs, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, enable_return_routed_experts=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_metrics=False, kv_cache_metrics_sample=0.01, cudagraph_metrics=False, enable_layerwise_nvtx_tracing=False, enable_mfu_metrics=False, enable_mm_processor_stats=False, enable_logging_iteration_details=False), seed=0, served_model_name=/mnt/workspace/qwen35_35B_A3B, enable_prefix_caching=False, enable_chunked_prefill=True, pooler_config=None, compilation_config={'mode': <CompilationMode.VLLM_COMPILE: 3>, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['none'], 'splitting_ops': ['vllm::unified_attention', 'vllm::unified_attention_with_output', 'vllm::unified_mla_attention', 'vllm::unified_mla_attention_with_output', 'vllm::mamba_mixer2', 'vllm::mamba_mixer', 'vllm::short_conv', 'vllm::linear_attention', 'vllm::plamo2_mamba_mixer', 'vllm::gdn_attention_core', 'vllm::olmo_hybrid_gdn_full_forward', 'vllm::kda_attention', 'vllm::sparse_attn_indexer', 'vllm::rocm_aiter_sparse_attn_indexer', 'vllm::unified_kv_cache_update', 'vllm::unified_mla_kv_cache_update'], 'compile_mm_encoder': False, 'compile_sizes': [], 'compile_ranges_endpoints': [8192], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.FULL_AND_PIECEWISE: (2, 1)>, 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256, 272, 288, 304, 320, 336, 352, 368, 384, 400, 416, 432, 448, 464, 480, 496, 512], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': True}, 'max_cudagraph_capture_size': 512, 'dynamic_shapes_config': {'type': <DynamicShapesType.BACKED: 'backed'>, 'evaluate_guards': False, 'assume_32_bit_indexing': False}, 'local_cache_dir': None, 'fast_moe_cold_start': True, 'static_all_moe_layers': []} (EngineCore pid=4128) WARNING 03-26 11:56:10 [multiproc_executor.py:997] Reducing Torch parallelism from 96 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed. (EngineCore pid=4128) INFO 03-26 11:56:10 [multiproc_executor.py:134] DP group leader: node_rank=0, node_rank_within_dp=0, master_addr=127.0.0.1, mq_connect_ip=10.128.136.58 (local), world_size=2, local_world_size=2 (Worker pid=4349) INFO 03-26 11:56:17 [parallel_state.py:1395] world_size=2 rank=0 local_rank=0 distributed_init_method=tcp://127.0.0.1:58241 backend=nccl (Worker pid=4350) INFO 03-26 11:56:17 [parallel_state.py:1395] world_size=2 rank=1 local_rank=1 distributed_init_method=tcp://127.0.0.1:58241 backend=nccl (Worker pid=4349) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. (Worker pid=4349) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. (Worker pid=4350) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead. (Worker pid=4350) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead. (Worker pid=4349) INFO 03-26 11:56:17 [pynccl.py:111] vLLM is using nccl==2.27.5 [AMP ERROR][CudaFrontend.cpp:94][1774526177:941819]failed to call cuCtxGetDevice(&device), error code: CUDA_ERROR_INVALID_CONTEXT

=============================================== Back trace dump: /usr/local/harp/lib/libvirtdev-frontend.so.0(LogStream::PrintBacktrace()+0x52) [0x7fbdce03dfa2] /lib/x86_64-linux-gnu/libcuda.so.1(CudaFeApiStateData::GetCurrentDevicePciBusId(Frontend*, int const*)+0x29a) [0x7fbdce42c40a] /lib/x86_64-linux-gnu/libcuda.so.1(ker: CudaFrontend.cpp:94: static const string& CudaFeApiStateData::GetCurrentDevicePciBusId(Frontend*, const CUdevice*): Assertion `0' failed. [rank1]:[W326 11:56:22.464163523 TCPStore.cpp:125] [c10d] recvValue failed on SocketImpl(fd=58, addr=[localhost]:55700, remote=[::ffff:127.0.0.1]:58241): Connection reset by peer Exception raised from recvBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:679 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7fef22b72fdd in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so) frame #1: <unknown function> + 0x6a332d2 (0x7fee3ce332d2 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so) frame #2: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x273 (0x7fee3ce311f3 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so) frame #3: c10d::ProcessGroupNCCL::HeartbeatMonitor::runLoop() + 0x44c (0x7fedde5849cc in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so) frame #4: <unknown function> + 0xdc253 (0x7feee96b0253 in /lib/x86_64-linux-gnu/libstdc++.so.6) frame #5: <unknown function> + 0x94ac3 (0x7fef23a94ac3 in /lib/x86_64-linux-gnu/libc.so.6) frame #6: <unknown function> + 0x1268d0 (0x7fef23b268d0 in /lib/x86_64-linux-gnu/libc.so.6)

[rank1]:[W326 11:56:22.468942400 ProcessGroupNCCL.cpp:1802] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Connection reset by peer [rank1]:[W326 11:56:23.469148865 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=58, addr=[::ffff:127.0.0.1]:55700, remote=[::ffff:127.0.0.1]:58241): Broken pipe Exception raised from sendBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:653 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7fef22b72fdd in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so) frame #1: <unknown function> + 0x6a326d1 (0x7fee3ce326d1 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so) frame #2: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x24d (0x7fee3ce311cd in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so) frame #3: c10d::ProcessGroupNCCL::HeartbeatMonitor::runLoop() + 0x44c (0x7fedde5849cc in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so) frame #4: <unknown function> + 0xdc253 (0x7feee96b0253 in /lib/x86_64-linux-gnu/libstdc++.so.6) frame #5: <unknown function> + 0x94ac3 (0x7fef23a94ac3 in /lib/x86_64-linux-gnu/libc.so.6) frame #6: <unknown function> + 0x1268d0 (0x7fef23b268d0 in /lib/x86_64-linux-gnu/libc.so.6)

[rank1]:[W326 11:56:23.473906819 ProcessGroupNCCL.cpp:1802] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe [rank1]:[W326 11:56:24.474022838 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=58, addr=[localhost]:55700, remote=[::ffff:127.0.0.1]:58241): Broken pipe Exception raised from sendBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:653 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7fef22b72fdd in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so) frame #1: <unknown function> + 0x6a326d1 (0x7fee3ce326d1 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so) frame #2: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x24d (0x7fee3ce311cd in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so) frame #3: c10d::ProcessGroupNCCL::HeartbeatMonitor::runLoop() + 0x44c (0x7fedde5849cc in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so) frame #4: <unknown function> + 0xdc253 (0x7feee96b0253 in /lib/x86_64-linux-gnu/libstdc++.so.6) frame #5: <unknown function> + 0x94ac3 (0x7fef23a94ac3 in /lib/x86_64-linux-gnu/libc.so.6) frame #6: <unknown function> + 0x1268d0 (0x7fef23b268d0 in /lib/x86_64-linux-gnu/libc.so.6)

[rank1]:[W326 11:56:24.477768700 ProcessGroupNCCL.cpp:1802] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe [rank1]:[W326 11:56:25.477949466 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=58, addr=[::ffff:127.0.0.1]:55700, remote=[::ffff:127.0.0.1]:58241): Broken pipe Exception raised from sendBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:653 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7fef22b72fdd in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so) frame #1: <unknown function> + 0x6a326d1 (0x7fee3ce326d1 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so) frame #2: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x24d (0x7fee3ce311cd in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so) frame #3: c10d::ProcessGroupNCCL::HeartbeatMonitor::runLoop() + 0x44c (0x7fedde5849cc in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so) frame #4: <unknown function> + 0xdc253 (0x7feee96b0253 in /lib/x86_64-linux-gnu/libstdc++.so.6) frame #5: <unknown function> + 0x94ac3 (0x7fef23a94ac3 in /lib/x86_64-linux-gnu/libc.so.6) frame #6: <unknown function> + 0x1268d0 (0x7fef23b268d0 in /lib/x86_64-linux-gnu/libc.so.6)

[rank1]:[W326 11:56:25.482151939 ProcessGroupNCCL.cpp:1802] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe [rank1]:[W326 11:56:26.482306727 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=58, addr=[localhost]:55700, remote=[::ffff:127.0.0.1]:58241): Broken pipe Exception raised from sendBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:653 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7fef22b72fdd in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so) frame #1: <unknown function> + 0x6a326d1 (0x7fee3ce326d1 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so) frame #2: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x24d (0x7fee3ce311cd in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so) frame #3: c10d::ProcessGroupNCCL::HeartbeatMonitor::runLoop() + 0x44c (0x7fedde5849cc in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so) frame #4: <unknown function> + 0xdc253 (0x7feee96b0253 in /lib/x86_64-linux-gnu/libstdc++.so.6) frame #5: <unknown function> + 0x94ac3 (0x7fef23a94ac3 in /lib/x86_64-linux-gnu/libc.so.6) frame #6: <unknown function> + 0x1268d0 (0x7fef23b268d0 in /lib/x86_64-linux-gnu/libc.so.6)

[rank1]:[W326 11:56:26.485868166 ProcessGroupNCCL.cpp:1802] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe [rank1]:[W326 11:56:27.486056402 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=58, addr=[::ffff:127.0.0.1]:55700, remote=[::ffff:127.0.0.1]:58241): Broken pipe Exception raised from sendBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:653 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7fef22b72fdd in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so) frame #1: <unknown function> + 0x6a326d1 (0x7fee3ce326d1 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so) frame #2: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x24d (0x7fee3ce311cd in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so) frame #3: c10d::ProcessGroupNCCL::HeartbeatMonitor::runLoop() + 0x44c (0x7fedde5849cc in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so) frame #4: <unknown function> + 0xdc253 (0x7feee96b0253 in /lib/x86_64-linux-gnu/libstdc++.so.6) frame #5: <unknown function> + 0x94ac3 (0x7fef23a94ac3 in /lib/x86_64-linux-gnu/libc.so.6) frame #6: <unknown function> + 0x1268d0 (0x7fef23b268d0 in /lib/x86_64-linux-gnu/libc.so.6)

[rank1]:[W326 11:56:27.489967359 ProcessGroupNCCL.cpp:1802] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe [rank1]:[W326 11:56:28.490093249 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=58, addr=[localhost]:55700, remote=[::ffff:127.0.0.1]:58241): Broken pipe Exception raised from sendBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:653 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7fef22b72fdd in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so) frame #1: <unknown function> + 0x6a326d1 (0x7fee3ce326d1 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so) frame #2: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x24d (0x7fee3ce311cd in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so) frame #3: c10d::ProcessGroupNCCL::HeartbeatMonitor::runLoop() + 0x44c (0x7fedde5849cc in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so) frame #4: <unknown function> + 0xdc253 (0x7feee96b0253 in /lib/x86_64-linux-gnu/libstdc++.so.6) frame #5: <unknown function> + 0x94ac3 (0x7fef23a94ac3 in /lib/x86_64-linux-gnu/libc.so.6) frame #6: <unknown function> + 0x1268d0 (0x7fef23b268d0 in /lib/x86_64-linux-gnu/libc.so.6)

[rank1]:[W326 11:56:28.493817989 ProcessGroupNCCL.cpp:1802] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe [rank1]:[W326 11:56:29.494007480 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=58, addr=[localhost]:55700, remote=[::ffff:127.0.0.1]:58241): Broken pipe Exception raised from sendBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:653 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7fef22b72fdd in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so) frame #1: <unknown function> + 0x6a326d1 (0x7fee3ce326d1 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so) frame #2: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x24d (0x7fee3ce311cd in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so) frame #3: c10d::ProcessGroupNCCL::HeartbeatMonitor::runLoop() + 0x44c (0x7fedde5849cc in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so) frame #4: <unknown function> + 0xdc253 (0x7feee96b0253 in /lib/x86_64-linux-gnu/libstdc++.so.6) frame #5: <unknown function> + 0x94ac3 (0x7fef23a94ac3 in /lib/x86_64-linux-gnu/libc.so.6) frame #6: <unknown function> + 0x1268d0 (0x7fef23b268d0 in /lib/x86_64-linux-gnu/libc.so.6)

[rank1]:[W326 11:56:29.498513163 ProcessGroupNCCL.cpp:1802] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] EngineCore failed to start. (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] Traceback (most recent call last): (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1073, in run_engine_core (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs) (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] return func(*args, **kwargs) (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 839, in init (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] super().init( (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 112, in init (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] self.model_executor = executor_class(vllm_config) (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 101, in init (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] super().init(vllm_config) (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] return func(*args, **kwargs) (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 103, in init (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] self._init_executor() (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 190, in _init_executor (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] self.workers = WorkerProc.wait_for_ready(unready_workers) (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 731, in wait_for_ready (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] raise e from None (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause. (EngineCore pid=4128) Process EngineCore: (EngineCore pid=4128) Traceback (most recent call last): (EngineCore pid=4128) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap (EngineCore pid=4128) self.run() (EngineCore pid=4128) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run (EngineCore pid=4128) self._target(*self._args, **self._kwargs) (EngineCore pid=4128) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1103, in run_engine_core (EngineCore pid=4128) raise e (EngineCore pid=4128) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1073, in run_engine_core (EngineCore pid=4128) engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs) (EngineCore pid=4128) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=4128) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=4128) return func(*args, **kwargs) (EngineCore pid=4128) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=4128) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 839, in init (EngineCore pid=4128) super().init( (EngineCore pid=4128) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 112, in init (EngineCore pid=4128) self.model_executor = executor_class(vllm_config) (EngineCore pid=4128) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=4128) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 101, in init (EngineCore pid=4128) super().init(vllm_config) (EngineCore pid=4128) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=4128) return func(*args, **kwargs) (EngineCore pid=4128) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=4128) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 103, in init (EngineCore pid=4128) self._init_executor() (EngineCore pid=4128) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 190, in _init_executor (EngineCore pid=4128) self.workers = WorkerProc.wait_for_ready(unready_workers) (EngineCore pid=4128) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=4128) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 731, in wait_for_ready (EngineCore pid=4128) raise e from None (EngineCore pid=4128) Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause. (APIServer pid=3842) Traceback (most recent call last): (APIServer pid=3842) File "/usr/local/bin/vllm", line 10, in <module> (APIServer pid=3842) sys.exit(main()) (APIServer pid=3842) ^^^^^^ (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 75, in main (APIServer pid=3842) args.dispatch_function(args) (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 118, in cmd (APIServer pid=3842) uvloop.run(run_server(args)) (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 96, in run (APIServer pid=3842) return __asyncio.run( (APIServer pid=3842) ^^^^^^^^^^^^^^ (APIServer pid=3842) File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run (APIServer pid=3842) return runner.run(main) (APIServer pid=3842) ^^^^^^^^^^^^^^^^ (APIServer pid=3842) File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run (APIServer pid=3842) return self._loop.run_until_complete(task) (APIServer pid=3842) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=3842) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 48, in wrapper (APIServer pid=3842) return await main (APIServer pid=3842) ^^^^^^^^^^ (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 656, in run_server (APIServer pid=3842) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs) (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 670, in run_server_worker (APIServer pid=3842) async with build_async_engine_client( (APIServer pid=3842) ^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=3842) File "/usr/lib/python3.12/contextlib.py", line 210, in aenter (APIServer pid=3842) return await anext(self.gen) (APIServer pid=3842) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 103, in build_async_engine_client (APIServer pid=3842) async with build_async_engine_client_from_engine_args( (APIServer pid=3842) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=3842) File "/usr/lib/python3.12/contextlib.py", line 210, in aenter (APIServer pid=3842) return await anext(self.gen) (APIServer pid=3842) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 144, in build_async_engine_client_from_engine_args (APIServer pid=3842) async_llm = AsyncLLM.from_vllm_config( (APIServer pid=3842) ^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 225, in from_vllm_config (APIServer pid=3842) return cls( (APIServer pid=3842) ^^^^ (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 154, in init (APIServer pid=3842) self.engine_core = EngineCoreClient.make_async_mp_client( (APIServer pid=3842) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (APIServer pid=3842) return func(*args, **kwargs) (APIServer pid=3842) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 128, in make_async_mp_client (APIServer pid=3842) return AsyncMPClient(*client_args) (APIServer pid=3842) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (APIServer pid=3842) return func(*args, **kwargs) (APIServer pid=3842) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 924, in init (APIServer pid=3842) super().init( (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 583, in init (APIServer pid=3842) with launch_core_engines( (APIServer pid=3842) ^^^^^^^^^^^^^^^^^^^^ (APIServer pid=3842) File "/usr/lib/python3.12/contextlib.py", line 144, in exit (APIServer pid=3842) next(self.gen) (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 972, in launch_core_engines (APIServer pid=3842) wait_for_engine_startup( (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 1031, in wait_for_engine_startup (APIServer pid=3842) raise RuntimeError( (APIServer pid=3842) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

Root Cause

[rank1]:[W326 11:56:29.498513163 ProcessGroupNCCL.cpp:1802] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] EngineCore failed to start. (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] Traceback (most recent call last): (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1073, in run_engine_core (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs) (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] return func(*args, **kwargs) (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 839, in init (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] super().init( (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 112, in init (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] self.model_executor = executor_class(vllm_config) (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 101, in init (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] super().init(vllm_config) (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] return func(*args, **kwargs) (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 103, in init (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] self._init_executor() (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 190, in _init_executor (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] self.workers = WorkerProc.wait_for_ready(unready_workers) (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 731, in wait_for_ready (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] raise e from None (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause. (EngineCore pid=4128) Process EngineCore: (EngineCore pid=4128) Traceback (most recent call last): (EngineCore pid=4128) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap (EngineCore pid=4128) self.run() (EngineCore pid=4128) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run (EngineCore pid=4128) self._target(*self._args, **self._kwargs) (EngineCore pid=4128) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1103, in run_engine_core (EngineCore pid=4128) raise e (EngineCore pid=4128) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1073, in run_engine_core (EngineCore pid=4128) engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs) (EngineCore pid=4128) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=4128) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=4128) return func(*args, **kwargs) (EngineCore pid=4128) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=4128) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 839, in init (EngineCore pid=4128) super().init( (EngineCore pid=4128) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 112, in init (EngineCore pid=4128) self.model_executor = executor_class(vllm_config) (EngineCore pid=4128) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=4128) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 101, in init (EngineCore pid=4128) super().init(vllm_config) (EngineCore pid=4128) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=4128) return func(*args, **kwargs) (EngineCore pid=4128) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=4128) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 103, in init (EngineCore pid=4128) self._init_executor() (EngineCore pid=4128) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 190, in _init_executor (EngineCore pid=4128) self.workers = WorkerProc.wait_for_ready(unready_workers) (EngineCore pid=4128) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=4128) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 731, in wait_for_ready (EngineCore pid=4128) raise e from None (EngineCore pid=4128) Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause. (APIServer pid=3842) Traceback (most recent call last): (APIServer pid=3842) File "/usr/local/bin/vllm", line 10, in <module> (APIServer pid=3842) sys.exit(main()) (APIServer pid=3842) ^^^^^^ (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 75, in main (APIServer pid=3842) args.dispatch_function(args) (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 118, in cmd (APIServer pid=3842) uvloop.run(run_server(args)) (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 96, in run (APIServer pid=3842) return __asyncio.run( (APIServer pid=3842) ^^^^^^^^^^^^^^ (APIServer pid=3842) File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run (APIServer pid=3842) return runner.run(main) (APIServer pid=3842) ^^^^^^^^^^^^^^^^ (APIServer pid=3842) File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run (APIServer pid=3842) return self._loop.run_until_complete(task) (APIServer pid=3842) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=3842) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 48, in wrapper (APIServer pid=3842) return await main (APIServer pid=3842) ^^^^^^^^^^ (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 656, in run_server (APIServer pid=3842) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs) (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 670, in run_server_worker (APIServer pid=3842) async with build_async_engine_client( (APIServer pid=3842) ^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=3842) File "/usr/lib/python3.12/contextlib.py", line 210, in aenter (APIServer pid=3842) return await anext(self.gen) (APIServer pid=3842) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 103, in build_async_engine_client (APIServer pid=3842) async with build_async_engine_client_from_engine_args( (APIServer pid=3842) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=3842) File "/usr/lib/python3.12/contextlib.py", line 210, in aenter (APIServer pid=3842) return await anext(self.gen) (APIServer pid=3842) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 144, in build_async_engine_client_from_engine_args (APIServer pid=3842) async_llm = AsyncLLM.from_vllm_config( (APIServer pid=3842) ^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 225, in from_vllm_config (APIServer pid=3842) return cls( (APIServer pid=3842) ^^^^ (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 154, in init (APIServer pid=3842) self.engine_core = EngineCoreClient.make_async_mp_client( (APIServer pid=3842) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (APIServer pid=3842) return func(*args, **kwargs) (APIServer pid=3842) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 128, in make_async_mp_client (APIServer pid=3842) return AsyncMPClient(*client_args) (APIServer pid=3842) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (APIServer pid=3842) return func(*args, **kwargs) (APIServer pid=3842) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 924, in init (APIServer pid=3842) super().init( (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 583, in init (APIServer pid=3842) with launch_core_engines( (APIServer pid=3842) ^^^^^^^^^^^^^^^^^^^^ (APIServer pid=3842) File "/usr/lib/python3.12/contextlib.py", line 144, in exit (APIServer pid=3842) next(self.gen) (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 972, in launch_core_engines (APIServer pid=3842) wait_for_engine_startup( (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 1031, in wait_for_engine_startup (APIServer pid=3842) raise RuntimeError( (APIServer pid=3842) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

Fix Action

Fix / Workaround

[rank1]:[W326 11:56:29.498513163 ProcessGroupNCCL.cpp:1802] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] EngineCore failed to start. (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] Traceback (most recent call last): (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1073, in run_engine_core (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs) (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] return func(*args, **kwargs) (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 839, in init (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] super().init( (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 112, in init (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] self.model_executor = executor_class(vllm_config) (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 101, in init (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] super().init(vllm_config) (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] return func(*args, **kwargs) (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 103, in init (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] self._init_executor() (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 190, in _init_executor (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] self.workers = WorkerProc.wait_for_ready(unready_workers) (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 731, in wait_for_ready (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] raise e from None (EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause. (EngineCore pid=4128) Process EngineCore: (EngineCore pid=4128) Traceback (most recent call last): (EngineCore pid=4128) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap (EngineCore pid=4128) self.run() (EngineCore pid=4128) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run (EngineCore pid=4128) self._target(*self._args, **self._kwargs) (EngineCore pid=4128) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1103, in run_engine_core (EngineCore pid=4128) raise e (EngineCore pid=4128) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1073, in run_engine_core (EngineCore pid=4128) engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs) (EngineCore pid=4128) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=4128) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=4128) return func(*args, **kwargs) (EngineCore pid=4128) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=4128) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 839, in init (EngineCore pid=4128) super().init( (EngineCore pid=4128) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 112, in init (EngineCore pid=4128) self.model_executor = executor_class(vllm_config) (EngineCore pid=4128) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=4128) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 101, in init (EngineCore pid=4128) super().init(vllm_config) (EngineCore pid=4128) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=4128) return func(*args, **kwargs) (EngineCore pid=4128) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=4128) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 103, in init (EngineCore pid=4128) self._init_executor() (EngineCore pid=4128) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 190, in _init_executor (EngineCore pid=4128) self.workers = WorkerProc.wait_for_ready(unready_workers) (EngineCore pid=4128) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=4128) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 731, in wait_for_ready (EngineCore pid=4128) raise e from None (EngineCore pid=4128) Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause. (APIServer pid=3842) Traceback (most recent call last): (APIServer pid=3842) File "/usr/local/bin/vllm", line 10, in <module> (APIServer pid=3842) sys.exit(main()) (APIServer pid=3842) ^^^^^^ (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 75, in main (APIServer pid=3842) args.dispatch_function(args) (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 118, in cmd (APIServer pid=3842) uvloop.run(run_server(args)) (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 96, in run (APIServer pid=3842) return __asyncio.run( (APIServer pid=3842) ^^^^^^^^^^^^^^ (APIServer pid=3842) File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run (APIServer pid=3842) return runner.run(main) (APIServer pid=3842) ^^^^^^^^^^^^^^^^ (APIServer pid=3842) File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run (APIServer pid=3842) return self._loop.run_until_complete(task) (APIServer pid=3842) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=3842) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 48, in wrapper (APIServer pid=3842) return await main (APIServer pid=3842) ^^^^^^^^^^ (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 656, in run_server (APIServer pid=3842) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs) (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 670, in run_server_worker (APIServer pid=3842) async with build_async_engine_client( (APIServer pid=3842) ^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=3842) File "/usr/lib/python3.12/contextlib.py", line 210, in aenter (APIServer pid=3842) return await anext(self.gen) (APIServer pid=3842) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 103, in build_async_engine_client (APIServer pid=3842) async with build_async_engine_client_from_engine_args( (APIServer pid=3842) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=3842) File "/usr/lib/python3.12/contextlib.py", line 210, in aenter (APIServer pid=3842) return await anext(self.gen) (APIServer pid=3842) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 144, in build_async_engine_client_from_engine_args (APIServer pid=3842) async_llm = AsyncLLM.from_vllm_config( (APIServer pid=3842) ^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 225, in from_vllm_config (APIServer pid=3842) return cls( (APIServer pid=3842) ^^^^ (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 154, in init (APIServer pid=3842) self.engine_core = EngineCoreClient.make_async_mp_client( (APIServer pid=3842) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (APIServer pid=3842) return func(*args, **kwargs) (APIServer pid=3842) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 128, in make_async_mp_client (APIServer pid=3842) return AsyncMPClient(*client_args) (APIServer pid=3842) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (APIServer pid=3842) return func(*args, **kwargs) (APIServer pid=3842) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 924, in init (APIServer pid=3842) super().init( (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 583, in init (APIServer pid=3842) with launch_core_engines( (APIServer pid=3842) ^^^^^^^^^^^^^^^^^^^^ (APIServer pid=3842) File "/usr/lib/python3.12/contextlib.py", line 144, in exit (APIServer pid=3842) next(self.gen) (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 972, in launch_core_engines (APIServer pid=3842) wait_for_engine_startup( (APIServer pid=3842) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 1031, in wait_for_engine_startup (APIServer pid=3842) raise RuntimeError( (APIServer pid=3842) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

Code Example

root@notebook-tianhangyao-ythvllmtest-prd-pre:/mnt/workspace# vllm serve  /mnt/workspace/qwen35_35B_A3B --port 8000 --host 0.0.0.0  --tensor-parallel-size 2 
(APIServer pid=3842) INFO 03-26 11:55:55 [utils.py:297] 
(APIServer pid=3842) INFO 03-26 11:55:55 [utils.py:297]        █     █     █▄   ▄█
(APIServer pid=3842) INFO 03-26 11:55:55 [utils.py:297]  ▄▄ ▄█ █     █     █ ▀▄▀ █  version 0.18.0
(APIServer pid=3842) INFO 03-26 11:55:55 [utils.py:297]   █▄█▀ █     █     █     █  model   /mnt/workspace/qwen35_35B_A3B
(APIServer pid=3842) INFO 03-26 11:55:55 [utils.py:297]    ▀▀  ▀▀▀▀▀ ▀▀▀▀▀ ▀     
(APIServer pid=3842) INFO 03-26 11:55:55 [utils.py:297] 
(APIServer pid=3842) INFO 03-26 11:55:55 [utils.py:233] non-default args: {'model_tag': '/mnt/workspace/qwen35_35B_A3B', 'host': '0.0.0.0', 'model': '/mnt/workspace/qwen35_35B_A3B', 'tensor_parallel_size': 2}
(APIServer pid=3842) INFO 03-26 11:55:56 [model.py:533] Resolved architecture: Qwen3_5MoeForConditionalGeneration
(APIServer pid=3842) INFO 03-26 11:55:56 [model.py:1582] Using max model len 262144
(APIServer pid=3842) INFO 03-26 11:55:56 [scheduler.py:231] Chunked prefill is enabled with max_num_batched_tokens=8192.
(APIServer pid=3842) INFO 03-26 11:55:57 [config.py:212] Setting attention block size to 1056 tokens to ensure that attention page size is >= mamba page size.
(APIServer pid=3842) INFO 03-26 11:55:57 [config.py:243] Padding mamba page size by 0.76% to ensure that mamba page size and attention page size are exactly equal.
(APIServer pid=3842) INFO 03-26 11:55:57 [vllm.py:754] Asynchronous scheduling is enabled.
(APIServer pid=3842) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead.
(APIServer pid=3842) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead.
(APIServer pid=3842) INFO 03-26 11:55:57 [compilation.py:289] Enabled custom fusions: allreduce_rms
h h h h h h h h h h h h h h h h h h hhhhhhhhhhhhhh(EngineCore pid=4128) INFO 03-26 11:56:10 [core.py:103] Initializing a V1 LLM engine (v0.18.0) with config: model='/mnt/workspace/qwen35_35B_A3B', speculative_config=None, tokenizer='/mnt/workspace/qwen35_35B_A3B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=262144, download_dir=None, load_format=auto, tensor_parallel_size=2, pipeline_parallel_size=1, data_parallel_size=1, decode_context_parallel_size=1, dcp_comm_backend=ag_rs, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, enable_return_routed_experts=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_metrics=False, kv_cache_metrics_sample=0.01, cudagraph_metrics=False, enable_layerwise_nvtx_tracing=False, enable_mfu_metrics=False, enable_mm_processor_stats=False, enable_logging_iteration_details=False), seed=0, served_model_name=/mnt/workspace/qwen35_35B_A3B, enable_prefix_caching=False, enable_chunked_prefill=True, pooler_config=None, compilation_config={'mode': <CompilationMode.VLLM_COMPILE: 3>, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['none'], 'splitting_ops': ['vllm::unified_attention', 'vllm::unified_attention_with_output', 'vllm::unified_mla_attention', 'vllm::unified_mla_attention_with_output', 'vllm::mamba_mixer2', 'vllm::mamba_mixer', 'vllm::short_conv', 'vllm::linear_attention', 'vllm::plamo2_mamba_mixer', 'vllm::gdn_attention_core', 'vllm::olmo_hybrid_gdn_full_forward', 'vllm::kda_attention', 'vllm::sparse_attn_indexer', 'vllm::rocm_aiter_sparse_attn_indexer', 'vllm::unified_kv_cache_update', 'vllm::unified_mla_kv_cache_update'], 'compile_mm_encoder': False, 'compile_sizes': [], 'compile_ranges_endpoints': [8192], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.FULL_AND_PIECEWISE: (2, 1)>, 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256, 272, 288, 304, 320, 336, 352, 368, 384, 400, 416, 432, 448, 464, 480, 496, 512], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': True}, 'max_cudagraph_capture_size': 512, 'dynamic_shapes_config': {'type': <DynamicShapesType.BACKED: 'backed'>, 'evaluate_guards': False, 'assume_32_bit_indexing': False}, 'local_cache_dir': None, 'fast_moe_cold_start': True, 'static_all_moe_layers': []}
(EngineCore pid=4128) WARNING 03-26 11:56:10 [multiproc_executor.py:997] Reducing Torch parallelism from 96 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed.
(EngineCore pid=4128) INFO 03-26 11:56:10 [multiproc_executor.py:134] DP group leader: node_rank=0, node_rank_within_dp=0, master_addr=127.0.0.1, mq_connect_ip=10.128.136.58 (local), world_size=2, local_world_size=2
(Worker pid=4349) INFO 03-26 11:56:17 [parallel_state.py:1395] world_size=2 rank=0 local_rank=0 distributed_init_method=tcp://127.0.0.1:58241 backend=nccl
(Worker pid=4350) INFO 03-26 11:56:17 [parallel_state.py:1395] world_size=2 rank=1 local_rank=1 distributed_init_method=tcp://127.0.0.1:58241 backend=nccl
(Worker pid=4349) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead.
(Worker pid=4349) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead.
(Worker pid=4350) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead.
(Worker pid=4350) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead.
(Worker pid=4349) INFO 03-26 11:56:17 [pynccl.py:111] vLLM is using nccl==2.27.5
[AMP ERROR][CudaFrontend.cpp:94][1774526177:941819]failed to call cuCtxGetDevice(&device), error code: CUDA_ERROR_INVALID_CONTEXT

===============================================
Back trace dump:
/usr/local/harp/lib/libvirtdev-frontend.so.0(LogStream::PrintBacktrace()+0x52) [0x7fbdce03dfa2]
/lib/x86_64-linux-gnu/libcuda.so.1(CudaFeApiStateData::GetCurrentDevicePciBusId(Frontend*, int const*)+0x29a) [0x7fbdce42c40a]
/lib/x86_64-linux-gnu/libcuda.so.1(ker: CudaFrontend.cpp:94: static const string& CudaFeApiStateData::GetCurrentDevicePciBusId(Frontend*, const CUdevice*): Assertion `0' failed.
[rank1]:[W326 11:56:22.464163523 TCPStore.cpp:125] [c10d] recvValue failed on SocketImpl(fd=58, addr=[localhost]:55700, remote=[::ffff:127.0.0.1]:58241): Connection reset by peer
Exception raised from recvBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:679 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7fef22b72fdd in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x6a332d2 (0x7fee3ce332d2 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
frame #2: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x273 (0x7fee3ce311f3 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
frame #3: c10d::ProcessGroupNCCL::HeartbeatMonitor::runLoop() + 0x44c (0x7fedde5849cc in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: <unknown function> + 0xdc253 (0x7feee96b0253 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #5: <unknown function> + 0x94ac3 (0x7fef23a94ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #6: <unknown function> + 0x1268d0 (0x7fef23b268d0 in /lib/x86_64-linux-gnu/libc.so.6)

[rank1]:[W326 11:56:22.468942400 ProcessGroupNCCL.cpp:1802] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Connection reset by peer
[rank1]:[W326 11:56:23.469148865 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=58, addr=[::ffff:127.0.0.1]:55700, remote=[::ffff:127.0.0.1]:58241): Broken pipe
Exception raised from sendBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:653 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7fef22b72fdd in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x6a326d1 (0x7fee3ce326d1 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
frame #2: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x24d (0x7fee3ce311cd in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
frame #3: c10d::ProcessGroupNCCL::HeartbeatMonitor::runLoop() + 0x44c (0x7fedde5849cc in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: <unknown function> + 0xdc253 (0x7feee96b0253 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #5: <unknown function> + 0x94ac3 (0x7fef23a94ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #6: <unknown function> + 0x1268d0 (0x7fef23b268d0 in /lib/x86_64-linux-gnu/libc.so.6)

[rank1]:[W326 11:56:23.473906819 ProcessGroupNCCL.cpp:1802] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe
[rank1]:[W326 11:56:24.474022838 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=58, addr=[localhost]:55700, remote=[::ffff:127.0.0.1]:58241): Broken pipe
Exception raised from sendBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:653 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7fef22b72fdd in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x6a326d1 (0x7fee3ce326d1 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
frame #2: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x24d (0x7fee3ce311cd in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
frame #3: c10d::ProcessGroupNCCL::HeartbeatMonitor::runLoop() + 0x44c (0x7fedde5849cc in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: <unknown function> + 0xdc253 (0x7feee96b0253 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #5: <unknown function> + 0x94ac3 (0x7fef23a94ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #6: <unknown function> + 0x1268d0 (0x7fef23b268d0 in /lib/x86_64-linux-gnu/libc.so.6)

[rank1]:[W326 11:56:24.477768700 ProcessGroupNCCL.cpp:1802] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe
[rank1]:[W326 11:56:25.477949466 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=58, addr=[::ffff:127.0.0.1]:55700, remote=[::ffff:127.0.0.1]:58241): Broken pipe
Exception raised from sendBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:653 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7fef22b72fdd in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x6a326d1 (0x7fee3ce326d1 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
frame #2: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x24d (0x7fee3ce311cd in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
frame #3: c10d::ProcessGroupNCCL::HeartbeatMonitor::runLoop() + 0x44c (0x7fedde5849cc in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: <unknown function> + 0xdc253 (0x7feee96b0253 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #5: <unknown function> + 0x94ac3 (0x7fef23a94ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #6: <unknown function> + 0x1268d0 (0x7fef23b268d0 in /lib/x86_64-linux-gnu/libc.so.6)

[rank1]:[W326 11:56:25.482151939 ProcessGroupNCCL.cpp:1802] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe
[rank1]:[W326 11:56:26.482306727 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=58, addr=[localhost]:55700, remote=[::ffff:127.0.0.1]:58241): Broken pipe
Exception raised from sendBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:653 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7fef22b72fdd in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x6a326d1 (0x7fee3ce326d1 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
frame #2: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x24d (0x7fee3ce311cd in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
frame #3: c10d::ProcessGroupNCCL::HeartbeatMonitor::runLoop() + 0x44c (0x7fedde5849cc in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: <unknown function> + 0xdc253 (0x7feee96b0253 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #5: <unknown function> + 0x94ac3 (0x7fef23a94ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #6: <unknown function> + 0x1268d0 (0x7fef23b268d0 in /lib/x86_64-linux-gnu/libc.so.6)

[rank1]:[W326 11:56:26.485868166 ProcessGroupNCCL.cpp:1802] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe
[rank1]:[W326 11:56:27.486056402 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=58, addr=[::ffff:127.0.0.1]:55700, remote=[::ffff:127.0.0.1]:58241): Broken pipe
Exception raised from sendBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:653 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7fef22b72fdd in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x6a326d1 (0x7fee3ce326d1 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
frame #2: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x24d (0x7fee3ce311cd in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
frame #3: c10d::ProcessGroupNCCL::HeartbeatMonitor::runLoop() + 0x44c (0x7fedde5849cc in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: <unknown function> + 0xdc253 (0x7feee96b0253 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #5: <unknown function> + 0x94ac3 (0x7fef23a94ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #6: <unknown function> + 0x1268d0 (0x7fef23b268d0 in /lib/x86_64-linux-gnu/libc.so.6)

[rank1]:[W326 11:56:27.489967359 ProcessGroupNCCL.cpp:1802] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe
[rank1]:[W326 11:56:28.490093249 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=58, addr=[localhost]:55700, remote=[::ffff:127.0.0.1]:58241): Broken pipe
Exception raised from sendBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:653 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7fef22b72fdd in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x6a326d1 (0x7fee3ce326d1 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
frame #2: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x24d (0x7fee3ce311cd in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
frame #3: c10d::ProcessGroupNCCL::HeartbeatMonitor::runLoop() + 0x44c (0x7fedde5849cc in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: <unknown function> + 0xdc253 (0x7feee96b0253 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #5: <unknown function> + 0x94ac3 (0x7fef23a94ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #6: <unknown function> + 0x1268d0 (0x7fef23b268d0 in /lib/x86_64-linux-gnu/libc.so.6)

[rank1]:[W326 11:56:28.493817989 ProcessGroupNCCL.cpp:1802] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe
[rank1]:[W326 11:56:29.494007480 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=58, addr=[localhost]:55700, remote=[::ffff:127.0.0.1]:58241): Broken pipe
Exception raised from sendBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:653 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7fef22b72fdd in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x6a326d1 (0x7fee3ce326d1 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
frame #2: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x24d (0x7fee3ce311cd in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
frame #3: c10d::ProcessGroupNCCL::HeartbeatMonitor::runLoop() + 0x44c (0x7fedde5849cc in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: <unknown function> + 0xdc253 (0x7feee96b0253 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #5: <unknown function> + 0x94ac3 (0x7fef23a94ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #6: <unknown function> + 0x1268d0 (0x7fef23b268d0 in /lib/x86_64-linux-gnu/libc.so.6)

[rank1]:[W326 11:56:29.498513163 ProcessGroupNCCL.cpp:1802] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] EngineCore failed to start.
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] Traceback (most recent call last):
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1073, in run_engine_core
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099]     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099]     return func(*args, **kwargs)
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 839, in __init__
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099]     super().__init__(
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 112, in __init__
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099]     self.model_executor = executor_class(vllm_config)
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 101, in __init__
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099]     super().__init__(vllm_config)
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099]     return func(*args, **kwargs)
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 103, in __init__
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099]     self._init_executor()
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 190, in _init_executor
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099]     self.workers = WorkerProc.wait_for_ready(unready_workers)
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 731, in wait_for_ready
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099]     raise e from None
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause.
(EngineCore pid=4128) Process EngineCore:
(EngineCore pid=4128) Traceback (most recent call last):
(EngineCore pid=4128)   File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore pid=4128)     self.run()
(EngineCore pid=4128)   File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore pid=4128)     self._target(*self._args, **self._kwargs)
(EngineCore pid=4128)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1103, in run_engine_core
(EngineCore pid=4128)     raise e
(EngineCore pid=4128)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1073, in run_engine_core
(EngineCore pid=4128)     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore pid=4128)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=4128)   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=4128)     return func(*args, **kwargs)
(EngineCore pid=4128)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=4128)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 839, in __init__
(EngineCore pid=4128)     super().__init__(
(EngineCore pid=4128)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 112, in __init__
(EngineCore pid=4128)     self.model_executor = executor_class(vllm_config)
(EngineCore pid=4128)                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=4128)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 101, in __init__
(EngineCore pid=4128)     super().__init__(vllm_config)
(EngineCore pid=4128)   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=4128)     return func(*args, **kwargs)
(EngineCore pid=4128)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=4128)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 103, in __init__
(EngineCore pid=4128)     self._init_executor()
(EngineCore pid=4128)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 190, in _init_executor
(EngineCore pid=4128)     self.workers = WorkerProc.wait_for_ready(unready_workers)
(EngineCore pid=4128)                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=4128)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 731, in wait_for_ready
(EngineCore pid=4128)     raise e from None
(EngineCore pid=4128) Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause.
(APIServer pid=3842) Traceback (most recent call last):
(APIServer pid=3842)   File "/usr/local/bin/vllm", line 10, in <module>
(APIServer pid=3842)     sys.exit(main())
(APIServer pid=3842)              ^^^^^^
(APIServer pid=3842)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 75, in main
(APIServer pid=3842)     args.dispatch_function(args)
(APIServer pid=3842)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 118, in cmd
(APIServer pid=3842)     uvloop.run(run_server(args))
(APIServer pid=3842)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 96, in run
(APIServer pid=3842)     return __asyncio.run(
(APIServer pid=3842)            ^^^^^^^^^^^^^^
(APIServer pid=3842)   File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=3842)     return runner.run(main)
(APIServer pid=3842)            ^^^^^^^^^^^^^^^^
(APIServer pid=3842)   File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=3842)     return self._loop.run_until_complete(task)
(APIServer pid=3842)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3842)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=3842)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=3842)     return await main
(APIServer pid=3842)            ^^^^^^^^^^
(APIServer pid=3842)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 656, in run_server
(APIServer pid=3842)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=3842)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 670, in run_server_worker
(APIServer pid=3842)     async with build_async_engine_client(
(APIServer pid=3842)                ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3842)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=3842)     return await anext(self.gen)
(APIServer pid=3842)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3842)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 103, in build_async_engine_client
(APIServer pid=3842)     async with build_async_engine_client_from_engine_args(
(APIServer pid=3842)                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3842)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=3842)     return await anext(self.gen)
(APIServer pid=3842)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3842)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 144, in build_async_engine_client_from_engine_args
(APIServer pid=3842)     async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=3842)                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3842)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 225, in from_vllm_config
(APIServer pid=3842)     return cls(
(APIServer pid=3842)            ^^^^
(APIServer pid=3842)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 154, in __init__
(APIServer pid=3842)     self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=3842)                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3842)   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(APIServer pid=3842)     return func(*args, **kwargs)
(APIServer pid=3842)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3842)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 128, in make_async_mp_client
(APIServer pid=3842)     return AsyncMPClient(*client_args)
(APIServer pid=3842)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3842)   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(APIServer pid=3842)     return func(*args, **kwargs)
(APIServer pid=3842)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3842)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 924, in __init__
(APIServer pid=3842)     super().__init__(
(APIServer pid=3842)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 583, in __init__
(APIServer pid=3842)     with launch_core_engines(
(APIServer pid=3842)          ^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3842)   File "/usr/lib/python3.12/contextlib.py", line 144, in __exit__
(APIServer pid=3842)     next(self.gen)
(APIServer pid=3842)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 972, in launch_core_engines
(APIServer pid=3842)     wait_for_engine_startup(
(APIServer pid=3842)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 1031, in wait_for_engine_startup
(APIServer pid=3842)     raise RuntimeError(
(APIServer pid=3842) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
RAW_BUFFERClick to expand / collapse

Your current environment

My ENV: +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 570.133.20 Driver Version: 570.133.20 CUDA Version: 12.8 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA H20-3e On | 00000000:08:00.0 Off | 0 | | N/A 37C P0 77W / 500W | 0MiB / 143771MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 1 NVIDIA H20-3e On | 00000001:7F:00.0 Off | 0 | | N/A 36C P0 79W / 500W | 0MiB / 143771MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+

🐛 Describe the bug

root@notebook-tianhangyao-ythvllmtest-prd-pre:/mnt/workspace# vllm serve  /mnt/workspace/qwen35_35B_A3B --port 8000 --host 0.0.0.0  --tensor-parallel-size 2 
(APIServer pid=3842) INFO 03-26 11:55:55 [utils.py:297] 
(APIServer pid=3842) INFO 03-26 11:55:55 [utils.py:297]        █     █     █▄   ▄█
(APIServer pid=3842) INFO 03-26 11:55:55 [utils.py:297]  ▄▄ ▄█ █     █     █ ▀▄▀ █  version 0.18.0
(APIServer pid=3842) INFO 03-26 11:55:55 [utils.py:297]   █▄█▀ █     █     █     █  model   /mnt/workspace/qwen35_35B_A3B
(APIServer pid=3842) INFO 03-26 11:55:55 [utils.py:297]    ▀▀  ▀▀▀▀▀ ▀▀▀▀▀ ▀     ▀
(APIServer pid=3842) INFO 03-26 11:55:55 [utils.py:297] 
(APIServer pid=3842) INFO 03-26 11:55:55 [utils.py:233] non-default args: {'model_tag': '/mnt/workspace/qwen35_35B_A3B', 'host': '0.0.0.0', 'model': '/mnt/workspace/qwen35_35B_A3B', 'tensor_parallel_size': 2}
(APIServer pid=3842) INFO 03-26 11:55:56 [model.py:533] Resolved architecture: Qwen3_5MoeForConditionalGeneration
(APIServer pid=3842) INFO 03-26 11:55:56 [model.py:1582] Using max model len 262144
(APIServer pid=3842) INFO 03-26 11:55:56 [scheduler.py:231] Chunked prefill is enabled with max_num_batched_tokens=8192.
(APIServer pid=3842) INFO 03-26 11:55:57 [config.py:212] Setting attention block size to 1056 tokens to ensure that attention page size is >= mamba page size.
(APIServer pid=3842) INFO 03-26 11:55:57 [config.py:243] Padding mamba page size by 0.76% to ensure that mamba page size and attention page size are exactly equal.
(APIServer pid=3842) INFO 03-26 11:55:57 [vllm.py:754] Asynchronous scheduling is enabled.
(APIServer pid=3842) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead.
(APIServer pid=3842) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead.
(APIServer pid=3842) INFO 03-26 11:55:57 [compilation.py:289] Enabled custom fusions: allreduce_rms
h h h h h h h h h h h h h h h h h h hhhhhhhhhhhhhh(EngineCore pid=4128) INFO 03-26 11:56:10 [core.py:103] Initializing a V1 LLM engine (v0.18.0) with config: model='/mnt/workspace/qwen35_35B_A3B', speculative_config=None, tokenizer='/mnt/workspace/qwen35_35B_A3B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=262144, download_dir=None, load_format=auto, tensor_parallel_size=2, pipeline_parallel_size=1, data_parallel_size=1, decode_context_parallel_size=1, dcp_comm_backend=ag_rs, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, enable_return_routed_experts=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_metrics=False, kv_cache_metrics_sample=0.01, cudagraph_metrics=False, enable_layerwise_nvtx_tracing=False, enable_mfu_metrics=False, enable_mm_processor_stats=False, enable_logging_iteration_details=False), seed=0, served_model_name=/mnt/workspace/qwen35_35B_A3B, enable_prefix_caching=False, enable_chunked_prefill=True, pooler_config=None, compilation_config={'mode': <CompilationMode.VLLM_COMPILE: 3>, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['none'], 'splitting_ops': ['vllm::unified_attention', 'vllm::unified_attention_with_output', 'vllm::unified_mla_attention', 'vllm::unified_mla_attention_with_output', 'vllm::mamba_mixer2', 'vllm::mamba_mixer', 'vllm::short_conv', 'vllm::linear_attention', 'vllm::plamo2_mamba_mixer', 'vllm::gdn_attention_core', 'vllm::olmo_hybrid_gdn_full_forward', 'vllm::kda_attention', 'vllm::sparse_attn_indexer', 'vllm::rocm_aiter_sparse_attn_indexer', 'vllm::unified_kv_cache_update', 'vllm::unified_mla_kv_cache_update'], 'compile_mm_encoder': False, 'compile_sizes': [], 'compile_ranges_endpoints': [8192], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.FULL_AND_PIECEWISE: (2, 1)>, 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256, 272, 288, 304, 320, 336, 352, 368, 384, 400, 416, 432, 448, 464, 480, 496, 512], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': True}, 'max_cudagraph_capture_size': 512, 'dynamic_shapes_config': {'type': <DynamicShapesType.BACKED: 'backed'>, 'evaluate_guards': False, 'assume_32_bit_indexing': False}, 'local_cache_dir': None, 'fast_moe_cold_start': True, 'static_all_moe_layers': []}
(EngineCore pid=4128) WARNING 03-26 11:56:10 [multiproc_executor.py:997] Reducing Torch parallelism from 96 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed.
(EngineCore pid=4128) INFO 03-26 11:56:10 [multiproc_executor.py:134] DP group leader: node_rank=0, node_rank_within_dp=0, master_addr=127.0.0.1, mq_connect_ip=10.128.136.58 (local), world_size=2, local_world_size=2
(Worker pid=4349) INFO 03-26 11:56:17 [parallel_state.py:1395] world_size=2 rank=0 local_rank=0 distributed_init_method=tcp://127.0.0.1:58241 backend=nccl
(Worker pid=4350) INFO 03-26 11:56:17 [parallel_state.py:1395] world_size=2 rank=1 local_rank=1 distributed_init_method=tcp://127.0.0.1:58241 backend=nccl
(Worker pid=4349) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead.
(Worker pid=4349) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead.
(Worker pid=4350) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead.
(Worker pid=4350) <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead.
(Worker pid=4349) INFO 03-26 11:56:17 [pynccl.py:111] vLLM is using nccl==2.27.5
[AMP ERROR][CudaFrontend.cpp:94][1774526177:941819]failed to call cuCtxGetDevice(&device), error code: CUDA_ERROR_INVALID_CONTEXT

===============================================
Back trace dump:
/usr/local/harp/lib/libvirtdev-frontend.so.0(LogStream::PrintBacktrace()+0x52) [0x7fbdce03dfa2]
/lib/x86_64-linux-gnu/libcuda.so.1(CudaFeApiStateData::GetCurrentDevicePciBusId(Frontend*, int const*)+0x29a) [0x7fbdce42c40a]
/lib/x86_64-linux-gnu/libcuda.so.1(ker: CudaFrontend.cpp:94: static const string& CudaFeApiStateData::GetCurrentDevicePciBusId(Frontend*, const CUdevice*): Assertion `0' failed.
[rank1]:[W326 11:56:22.464163523 TCPStore.cpp:125] [c10d] recvValue failed on SocketImpl(fd=58, addr=[localhost]:55700, remote=[::ffff:127.0.0.1]:58241): Connection reset by peer
Exception raised from recvBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:679 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7fef22b72fdd in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x6a332d2 (0x7fee3ce332d2 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
frame #2: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x273 (0x7fee3ce311f3 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
frame #3: c10d::ProcessGroupNCCL::HeartbeatMonitor::runLoop() + 0x44c (0x7fedde5849cc in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: <unknown function> + 0xdc253 (0x7feee96b0253 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #5: <unknown function> + 0x94ac3 (0x7fef23a94ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #6: <unknown function> + 0x1268d0 (0x7fef23b268d0 in /lib/x86_64-linux-gnu/libc.so.6)

[rank1]:[W326 11:56:22.468942400 ProcessGroupNCCL.cpp:1802] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Connection reset by peer
[rank1]:[W326 11:56:23.469148865 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=58, addr=[::ffff:127.0.0.1]:55700, remote=[::ffff:127.0.0.1]:58241): Broken pipe
Exception raised from sendBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:653 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7fef22b72fdd in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x6a326d1 (0x7fee3ce326d1 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
frame #2: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x24d (0x7fee3ce311cd in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
frame #3: c10d::ProcessGroupNCCL::HeartbeatMonitor::runLoop() + 0x44c (0x7fedde5849cc in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: <unknown function> + 0xdc253 (0x7feee96b0253 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #5: <unknown function> + 0x94ac3 (0x7fef23a94ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #6: <unknown function> + 0x1268d0 (0x7fef23b268d0 in /lib/x86_64-linux-gnu/libc.so.6)

[rank1]:[W326 11:56:23.473906819 ProcessGroupNCCL.cpp:1802] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe
[rank1]:[W326 11:56:24.474022838 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=58, addr=[localhost]:55700, remote=[::ffff:127.0.0.1]:58241): Broken pipe
Exception raised from sendBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:653 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7fef22b72fdd in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x6a326d1 (0x7fee3ce326d1 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
frame #2: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x24d (0x7fee3ce311cd in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
frame #3: c10d::ProcessGroupNCCL::HeartbeatMonitor::runLoop() + 0x44c (0x7fedde5849cc in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: <unknown function> + 0xdc253 (0x7feee96b0253 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #5: <unknown function> + 0x94ac3 (0x7fef23a94ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #6: <unknown function> + 0x1268d0 (0x7fef23b268d0 in /lib/x86_64-linux-gnu/libc.so.6)

[rank1]:[W326 11:56:24.477768700 ProcessGroupNCCL.cpp:1802] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe
[rank1]:[W326 11:56:25.477949466 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=58, addr=[::ffff:127.0.0.1]:55700, remote=[::ffff:127.0.0.1]:58241): Broken pipe
Exception raised from sendBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:653 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7fef22b72fdd in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x6a326d1 (0x7fee3ce326d1 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
frame #2: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x24d (0x7fee3ce311cd in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
frame #3: c10d::ProcessGroupNCCL::HeartbeatMonitor::runLoop() + 0x44c (0x7fedde5849cc in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: <unknown function> + 0xdc253 (0x7feee96b0253 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #5: <unknown function> + 0x94ac3 (0x7fef23a94ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #6: <unknown function> + 0x1268d0 (0x7fef23b268d0 in /lib/x86_64-linux-gnu/libc.so.6)

[rank1]:[W326 11:56:25.482151939 ProcessGroupNCCL.cpp:1802] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe
[rank1]:[W326 11:56:26.482306727 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=58, addr=[localhost]:55700, remote=[::ffff:127.0.0.1]:58241): Broken pipe
Exception raised from sendBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:653 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7fef22b72fdd in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x6a326d1 (0x7fee3ce326d1 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
frame #2: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x24d (0x7fee3ce311cd in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
frame #3: c10d::ProcessGroupNCCL::HeartbeatMonitor::runLoop() + 0x44c (0x7fedde5849cc in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: <unknown function> + 0xdc253 (0x7feee96b0253 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #5: <unknown function> + 0x94ac3 (0x7fef23a94ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #6: <unknown function> + 0x1268d0 (0x7fef23b268d0 in /lib/x86_64-linux-gnu/libc.so.6)

[rank1]:[W326 11:56:26.485868166 ProcessGroupNCCL.cpp:1802] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe
[rank1]:[W326 11:56:27.486056402 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=58, addr=[::ffff:127.0.0.1]:55700, remote=[::ffff:127.0.0.1]:58241): Broken pipe
Exception raised from sendBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:653 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7fef22b72fdd in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x6a326d1 (0x7fee3ce326d1 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
frame #2: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x24d (0x7fee3ce311cd in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
frame #3: c10d::ProcessGroupNCCL::HeartbeatMonitor::runLoop() + 0x44c (0x7fedde5849cc in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: <unknown function> + 0xdc253 (0x7feee96b0253 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #5: <unknown function> + 0x94ac3 (0x7fef23a94ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #6: <unknown function> + 0x1268d0 (0x7fef23b268d0 in /lib/x86_64-linux-gnu/libc.so.6)

[rank1]:[W326 11:56:27.489967359 ProcessGroupNCCL.cpp:1802] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe
[rank1]:[W326 11:56:28.490093249 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=58, addr=[localhost]:55700, remote=[::ffff:127.0.0.1]:58241): Broken pipe
Exception raised from sendBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:653 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7fef22b72fdd in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x6a326d1 (0x7fee3ce326d1 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
frame #2: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x24d (0x7fee3ce311cd in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
frame #3: c10d::ProcessGroupNCCL::HeartbeatMonitor::runLoop() + 0x44c (0x7fedde5849cc in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: <unknown function> + 0xdc253 (0x7feee96b0253 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #5: <unknown function> + 0x94ac3 (0x7fef23a94ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #6: <unknown function> + 0x1268d0 (0x7fef23b268d0 in /lib/x86_64-linux-gnu/libc.so.6)

[rank1]:[W326 11:56:28.493817989 ProcessGroupNCCL.cpp:1802] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe
[rank1]:[W326 11:56:29.494007480 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=58, addr=[localhost]:55700, remote=[::ffff:127.0.0.1]:58241): Broken pipe
Exception raised from sendBytes at /pytorch/torch/csrc/distributed/c10d/Utils.hpp:653 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x9d (0x7fef22b72fdd in /usr/local/lib/python3.12/dist-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x6a326d1 (0x7fee3ce326d1 in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
frame #2: c10d::TCPStore::check(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) + 0x24d (0x7fee3ce311cd in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cpu.so)
frame #3: c10d::ProcessGroupNCCL::HeartbeatMonitor::runLoop() + 0x44c (0x7fedde5849cc in /usr/local/lib/python3.12/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: <unknown function> + 0xdc253 (0x7feee96b0253 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #5: <unknown function> + 0x94ac3 (0x7fef23a94ac3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #6: <unknown function> + 0x1268d0 (0x7fef23b268d0 in /lib/x86_64-linux-gnu/libc.so.6)

[rank1]:[W326 11:56:29.498513163 ProcessGroupNCCL.cpp:1802] [PG ID 0 PG GUID 0 Rank 1] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] EngineCore failed to start.
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] Traceback (most recent call last):
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1073, in run_engine_core
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099]     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099]     return func(*args, **kwargs)
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 839, in __init__
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099]     super().__init__(
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 112, in __init__
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099]     self.model_executor = executor_class(vllm_config)
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 101, in __init__
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099]     super().__init__(vllm_config)
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099]   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099]     return func(*args, **kwargs)
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 103, in __init__
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099]     self._init_executor()
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 190, in _init_executor
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099]     self.workers = WorkerProc.wait_for_ready(unready_workers)
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 731, in wait_for_ready
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099]     raise e from None
(EngineCore pid=4128) ERROR 03-26 11:56:30 [core.py:1099] Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause.
(EngineCore pid=4128) Process EngineCore:
(EngineCore pid=4128) Traceback (most recent call last):
(EngineCore pid=4128)   File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore pid=4128)     self.run()
(EngineCore pid=4128)   File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore pid=4128)     self._target(*self._args, **self._kwargs)
(EngineCore pid=4128)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1103, in run_engine_core
(EngineCore pid=4128)     raise e
(EngineCore pid=4128)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1073, in run_engine_core
(EngineCore pid=4128)     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore pid=4128)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=4128)   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=4128)     return func(*args, **kwargs)
(EngineCore pid=4128)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=4128)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 839, in __init__
(EngineCore pid=4128)     super().__init__(
(EngineCore pid=4128)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 112, in __init__
(EngineCore pid=4128)     self.model_executor = executor_class(vllm_config)
(EngineCore pid=4128)                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=4128)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 101, in __init__
(EngineCore pid=4128)     super().__init__(vllm_config)
(EngineCore pid=4128)   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=4128)     return func(*args, **kwargs)
(EngineCore pid=4128)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=4128)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 103, in __init__
(EngineCore pid=4128)     self._init_executor()
(EngineCore pid=4128)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 190, in _init_executor
(EngineCore pid=4128)     self.workers = WorkerProc.wait_for_ready(unready_workers)
(EngineCore pid=4128)                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=4128)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 731, in wait_for_ready
(EngineCore pid=4128)     raise e from None
(EngineCore pid=4128) Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause.
(APIServer pid=3842) Traceback (most recent call last):
(APIServer pid=3842)   File "/usr/local/bin/vllm", line 10, in <module>
(APIServer pid=3842)     sys.exit(main())
(APIServer pid=3842)              ^^^^^^
(APIServer pid=3842)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 75, in main
(APIServer pid=3842)     args.dispatch_function(args)
(APIServer pid=3842)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 118, in cmd
(APIServer pid=3842)     uvloop.run(run_server(args))
(APIServer pid=3842)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 96, in run
(APIServer pid=3842)     return __asyncio.run(
(APIServer pid=3842)            ^^^^^^^^^^^^^^
(APIServer pid=3842)   File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=3842)     return runner.run(main)
(APIServer pid=3842)            ^^^^^^^^^^^^^^^^
(APIServer pid=3842)   File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=3842)     return self._loop.run_until_complete(task)
(APIServer pid=3842)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3842)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=3842)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=3842)     return await main
(APIServer pid=3842)            ^^^^^^^^^^
(APIServer pid=3842)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 656, in run_server
(APIServer pid=3842)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=3842)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 670, in run_server_worker
(APIServer pid=3842)     async with build_async_engine_client(
(APIServer pid=3842)                ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3842)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=3842)     return await anext(self.gen)
(APIServer pid=3842)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3842)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 103, in build_async_engine_client
(APIServer pid=3842)     async with build_async_engine_client_from_engine_args(
(APIServer pid=3842)                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3842)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=3842)     return await anext(self.gen)
(APIServer pid=3842)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3842)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 144, in build_async_engine_client_from_engine_args
(APIServer pid=3842)     async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=3842)                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3842)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 225, in from_vllm_config
(APIServer pid=3842)     return cls(
(APIServer pid=3842)            ^^^^
(APIServer pid=3842)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 154, in __init__
(APIServer pid=3842)     self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=3842)                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3842)   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(APIServer pid=3842)     return func(*args, **kwargs)
(APIServer pid=3842)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3842)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 128, in make_async_mp_client
(APIServer pid=3842)     return AsyncMPClient(*client_args)
(APIServer pid=3842)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3842)   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(APIServer pid=3842)     return func(*args, **kwargs)
(APIServer pid=3842)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3842)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 924, in __init__
(APIServer pid=3842)     super().__init__(
(APIServer pid=3842)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 583, in __init__
(APIServer pid=3842)     with launch_core_engines(
(APIServer pid=3842)          ^^^^^^^^^^^^^^^^^^^^
(APIServer pid=3842)   File "/usr/lib/python3.12/contextlib.py", line 144, in __exit__
(APIServer pid=3842)     next(self.gen)
(APIServer pid=3842)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 972, in launch_core_engines
(APIServer pid=3842)     wait_for_engine_startup(
(APIServer pid=3842)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 1031, in wait_for_engine_startup
(APIServer pid=3842)     raise RuntimeError(
(APIServer pid=3842) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

The error message indicates a problem with the initialization of the EngineCore, which is likely related to the distributed training setup. To fix this issue, try the following steps:

  • Check the CUDA version: Ensure that the CUDA version is compatible with the PyTorch version being used. In this case, the CUDA version is 12.8, which should be compatible with PyTorch.
  • Check the NCCL version: The error message mentions NCCL, which is a library used for distributed training. Ensure that the NCCL version is compatible with the PyTorch version being used. In this case, the NCCL version is 2.27.5.
  • Disable CUDA cache: Try disabling the CUDA cache by setting the CUDA_CACHE_DISABLE environment variable to 1. This can help resolve issues related to CUDA cache corruption.
  • Check the GPU memory: Ensure that the GPU has sufficient memory to run the model. The error message indicates that the GPU memory usage is 0MiB, which is unusual.
  • Try a different distributed backend: If possible, try using a different distributed backend, such as gloo or mpi, to see if the issue persists.

Example code to disable CUDA cache:

import os
os.environ['CUDA_CACHE_DISABLE'] = '1'

Example code to check GPU memory:

import torch
print(torch.cuda.memory_allocated())

Verification

To verify that the fix worked, try running the vllm serve command again with the same arguments. If the issue persists, try checking the GPU memory usage and CUDA cache to ensure that they are not causing the problem.

Extra Tips

  • Make sure to check the PyTorch and CUDA versions to ensure they are compatible.
  • If using a virtual environment, ensure that the CUDA and NCCL libraries are properly installed and configured.
  • Try running the model on a single GPU to see if the issue persists, which can help determine if the problem is related to distributed training.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING