vllm - ✅(Solved) Fix [Bug]: GPU failure during repeated model loading when using --enable-prefix-caching with KV transfer (LMCacheConnectorV1) [1 pull requests, 4 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#36852Fetched 2026-04-08 00:34:15
View on GitHub
Comments
4
Participants
2
Timeline
17
Reactions
0
Timeline (top)
mentioned ×5subscribed ×5commented ×4cross-referenced ×1

Error Message

The GPUs are entering an error state during the second model load when running vLLM with prefix caching and KV transfer enabled. After a system reboot, the model loads successfully and runs without any issues at first time . However, if i try to load again for the second time , the GPUs go into an error state during the second initialization. 4. During the second load, the GPUs enter an error state. below is the error i am getting 2026-03-05T12:23:14.561Z - WARN: vLLM Server stderr (PID 84695): [rank1]:[E305 17:53:14.167809819 ProcessGroupNCCL.cpp:2057] [PG ID 2 PG GUID 3 Rank 1] Process group watchdog thread terminated with exception: CUDA error: unspecified launch failure Exception raised from c10_cuda_check_implementation at /pytorch/c10/cuda/CUDAException.cpp:44 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x80 (0x76975933fb80 in /opt/vllm-venv/lib/python3.10/site-packages/torch/lib/libc10.so) 2026-03-05T12:23:14.562Z - WARN: vLLM Server stderr (PID 84695): terminate called after throwing an instance of 'c10::DistBackendError' 2026-03-05T12:23:14.564Z - WARN: vLLM Server stderr (PID 84695): what(): [PG ID 2 PG GUID 3 Rank 1] Process group watchdog thread terminated with exception: CUDA error: unspecified launch failure Exception raised from c10_cuda_check_implementation at /pytorch/c10/cuda/CUDAException.cpp:44 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x80 (0x76975933fb80 in /opt/vllm-venv/lib/python3.10/site-packages/torch/lib/libc10.so) Exception raised from run at /pytorch/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:2063 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x80 (0x76975933fb80 in /opt/vllm-venv/lib/python3.10/site-packages/torch/lib/libc10.so)

Fix Action

Fix / Workaround

CPU INFO Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 57 bits virtual Byte Order: Little Endian CPU(s): 48 On-line CPU(s) list: 0-47 Vendor ID: GenuineIntel Model name: Intel(R) Xeon(R) Silver 4310 CPU @ 2.10GHz CPU family: 6 Model: 106 Thread(s) per core: 2 Core(s) per socket: 12 Socket(s): 2 Stepping: 6 CPU max MHz: 3300.0000 CPU min MHz: 800.0000 BogoMIPS: 4200.00 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local split_lock_detect wbnoinvd dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq la57 rdpid fsrm md_clear pconfig flush_l1d arch_capabilities Virtualization: VT-x L1d cache: 1.1 MiB (24 instances) L1i cache: 768 KiB (24 instances) L2 cache: 30 MiB (24 instances) L3 cache: 36 MiB (2 instances) NUMA node(s): 2 NUMA node0 CPU(s): 0-11,24-35 NUMA node1 CPU(s): 12-23,36-47 Vulnerability Gather data sampling: Mitigation; Microcode Vulnerability Indirect target selection: Vulnerable Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable Vulnerability Reg file data sampling: Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditional; PBRSB-eIBRS SW sequence; BHI SW loop, KVM SW loop Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected

PR fix notes

PR #36905: [WIP][BugFix] Add missing shutdown() to LMCacheConnectorV1 to fix GPU failure on repeated model loading

Description (problem / solution / changelog)

Purpose

Fixes #36852.

LMCacheConnectorV1 did not override shutdown(), inheriting the base class no-op from KVConnectorBase_V1. When GPUWorker.shutdown() calls ensure_kv_transfer_shutdown(), the connector's shutdown() was a no-op, so none of the LMCache resources were cleaned up:

  • LMCacheEngine (GPU buffers, CUDA state) — created via LMCacheEngineBuilder.get_or_create() singleton
  • GPU connectors (CUDA memory buffers)
  • ZMQOffloadServer, InternalAPIServer, RuntimePluginLauncher
  • Lookup server/client

This caused cudaErrorLaunchFailure on the second model load because stale GPU resources from the first run corrupted GPU state.

This PR adds shutdown() to both LMCacheConnectorV1 (wrapper) and LMCacheConnectorV1Impl (native adapter) to properly tear down all resources in the correct order:

  1. Stop auxiliary services (API server, plugin launcher, offload server)
  2. Close network resources (lookup server/client)
  3. Destroy singleton builders via LMCacheEngineBuilder.destroy() and LMCBlenderBuilder.destroy() to free GPU buffers

Each cleanup step is wrapped in try/except (following the MultiConnector.shutdown() pattern) to ensure all resources are cleaned up even if one step fails.

Note: LMCacheMPConnector already had a proper shutdown() — this bug only affected LMCacheConnectorV1.

Test Plan

Wait for @lavanyabollepalli 's input

Test Result

Changed files

  • vllm/distributed/kv_transfer/kv_connector/v1/lmcache_connector.py (modified, +7/-0)
  • vllm/distributed/kv_transfer/kv_connector/v1/lmcache_integration/vllm_v1_adapter.py (modified, +35/-0)

Code Example

==============================
SYSTEM INFORMATION
==============================
Thursday 12 March 2026 10:23:11 AM IST
Linux MW83-RP0-000 6.8.0-101-generic #101~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Feb 11 13:19:54 UTC  x86_64 x86_64 x86_64 GNU/Linux

OS RELEASE
PRETTY_NAME="Ubuntu 22.04.5 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.5 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

CPU INFO
Architecture:                            x86_64
CPU op-mode(s):                          32-bit, 64-bit
Address sizes:                           46 bits physical, 57 bits virtual
Byte Order:                              Little Endian
CPU(s):                                  48
On-line CPU(s) list:                     0-47
Vendor ID:                               GenuineIntel
Model name:                              Intel(R) Xeon(R) Silver 4310 CPU @ 2.10GHz
CPU family:                              6
Model:                                   106
Thread(s) per core:                      2
Core(s) per socket:                      12
Socket(s):                               2
Stepping:                                6
CPU max MHz:                             3300.0000
CPU min MHz:                             800.0000
BogoMIPS:                                4200.00
Flags:                                   fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local split_lock_detect wbnoinvd dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq la57 rdpid fsrm md_clear pconfig flush_l1d arch_capabilities
Virtualization:                          VT-x
L1d cache:                               1.1 MiB (24 instances)
L1i cache:                               768 KiB (24 instances)
L2 cache:                                30 MiB (24 instances)
L3 cache:                                36 MiB (2 instances)
NUMA node(s):                            2
NUMA node0 CPU(s):                       0-11,24-35
NUMA node1 CPU(s):                       12-23,36-47
Vulnerability Gather data sampling:      Mitigation; Microcode
Vulnerability Indirect target selection: Vulnerable
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; Enhanced / Automatic IBRS; IBPB conditional; PBRSB-eIBRS SW sequence; BHI SW loop, KVM SW loop
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

==============================
GPU INFORMATION
==============================

nvidia-smi
Thu Mar 12 10:23:11 2026       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.126.09             Driver Version: 580.126.09     CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX 4000 Ada Gene...    Off |   00000000:31:00.0 Off |                  Off |
| 30%   48C    P8              4W /  130W |      15MiB /  20475MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA RTX 4000 Ada Gene...    Off |   00000000:4B:00.0 Off |                  Off |
| 30%   49C    P8              4W /  130W |      15MiB /  20475MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            1819      G   /usr/lib/xorg/Xorg                        4MiB |
|    1   N/A  N/A            1819      G   /usr/lib/xorg/Xorg                        4MiB |
+-----------------------------------------------------------------------------------------+

GPU LIST
GPU 0: NVIDIA RTX 4000 Ada Generation (UUID: GPU-edbe6c9c-a6f6-ccc9-e5ea-bb588bac21d7)
GPU 1: NVIDIA RTX 4000 Ada Generation (UUID: GPU-f9af3eac-ac48-a861-c5d5-9d3ed87954b6)

GPU TOPOLOGY
	GPU0	GPU1	CPU Affinity	NUMA Affinity	GPU NUMA ID
GPU0	 X 	NODE	0-11,24-35	0		N/A
GPU1	NODE	 X 	0-11,24-35	0		N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

GPU PCI BUS INFO
name, driver_version, pci.bus_id
NVIDIA RTX 4000 Ada Generation, 580.126.09, 00000000:31:00.0
NVIDIA RTX 4000 Ada Generation, 580.126.09, 00000000:4B:00.0

==============================
PCIE TOPOLOGY
==============================
-+-[0000:ff]-+-00.0  Intel Corporation Device 344c
 |           +-00.1  Intel Corporation Device 344c
 |           +-00.2  Intel Corporation Device 344c
 |           +-00.3  Intel Corporation Device 344c
 |           +-00.4  Intel Corporation Device 344c
 |           +-00.5  Intel Corporation Device 344c
 |           +-00.6  Intel Corporation Device 344c
 |           +-00.7  Intel Corporation Device 344c
 |           +-01.0  Intel Corporation Device 344c
 |           +-01.1  Intel Corporation Device 344c
 |           +-01.2  Intel Corporation Device 344c
 |           +-01.3  Intel Corporation Device 344c
 |           +-01.4  Intel Corporation Device 344c
 |           +-01.5  Intel Corporation Device 344c
 |           +-01.6  Intel Corporation Device 344c
 |           +-01.7  Intel Corporation Device 344c
 |           +-02.0  Intel Corporation Device 344c
 |           +-02.1  Intel Corporation Device 344c
 |           +-02.2  Intel Corporation Device 344c
 |           +-02.3  Intel Corporation Device 344c
 |           +-02.4  Intel Corporation Device 344c
 |           +-02.5  Intel Corporation Device 344c
 |           +-02.6  Intel Corporation Device 344c
 |           +-02.7  Intel Corporation Device 344c
 |           +-03.0  Intel Corporation Device 344c
 |           +-03.1  Intel Corporation Device 344c
 |           +-03.2  Intel Corporation Device 344c
 |           +-03.3  Intel Corporation Device 344c
 |           +-0a.0  Intel Corporation Device 344d
 |           +-0a.1  Intel Corporation Device 344d
 |           +-0a.2  Intel Corporation Device 344d
 |           +-0a.3  Intel Corporation Device 344d
 |           +-0a.4  Intel Corporation Device 344d
 |           +-0a.5  Intel Corporation Device 344d
 |           +-0a.6  Intel Corporation Device 344d
 |           +-0a.7  Intel Corporation Device 344d
 |           +-0b.0  Intel Corporation Device 344d
 |           +-0b.1  Intel Corporation Device 344d
 |           +-0b.2  Intel Corporation Device 344d
 |           +-0b.3  Intel Corporation Device 344d
 |           +-0b.4  Intel Corporation Device 344d
 |           +-0b.5  Intel Corporation Device 344d
 |           +-0b.6  Intel Corporation Device 344d
 |           +-0b.7  Intel Corporation Device 344d
 |           +-0c.0  Intel Corporation Device 344d
 |           +-0c.1  Intel Corporation Device 344d
 |           +-0c.2  Intel Corporation Device 344d
 |           +-0c.3  Intel Corporation Device 344d
 |           +-0c.4  Intel Corporation Device 344d
 |           +-0c.5  Intel Corporation Device 344d
 |           +-0c.6  Intel Corporation Device 344d
 |           +-0c.7  Intel Corporation Device 344d
 |           +-0d.0  Intel Corporation Device 344d
 |           +-0d.1  Intel Corporation Device 344d
 |           +-0d.2  Intel Corporation Device 344d
 |           +-0d.3  Intel Corporation Device 344d
 |           +-1d.0  Intel Corporation Device 344f
 |           +-1d.1  Intel Corporation Device 3457
 |           +-1e.0  Intel Corporation Device 3458
 |           +-1e.1  Intel Corporation Device 3459
 |           +-1e.2  Intel Corporation Device 345a
 |           +-1e.3  Intel Corporation Device 345b
 |           +-1e.4  Intel Corporation Device 345c
 |           +-1e.5  Intel Corporation Device 345d
 |           +-1e.6  Intel Corporation Device 345e
 |           \-1e.7  Intel Corporation Device 345f
 +-[0000:fe]-+-00.0  Intel Corporation Device 3450
 |           +-00.1  Intel Corporation Device 3451
 |           +-00.2  Intel Corporation Device 3452
 |           +-00.3  Intel Corporation Device 0998
 |           +-00.5  Intel Corporation Device 3455
 |           +-02.0  Intel Corporation Device 3440
 |           +-02.1  Intel Corporation Device 3441
 |           +-02.2  Intel Corporation Device 3442
 |           +-04.0  Intel Corporation Device 3440
 |           +-04.1  Intel Corporation Device 3441
 |           +-04.2  Intel Corporation Device 3442
 |           +-04.3  Intel Corporation Device 3443
 |           +-05.0  Intel Corporation Device 3445
 |           +-05.1  Intel Corporation Device 3446
 |           +-05.2  Intel Corporation Device 3447
 |           +-06.0  Intel Corporation Device 3445
 |           +-06.1  Intel Corporation Device 3446
 |           +-06.2  Intel Corporation Device 3447
 |           +-07.0  Intel Corporation Device 3445
 |           +-07.1  Intel Corporation Device 3446
 |           +-07.2  Intel Corporation Device 3447
 |           +-0b.0  Intel Corporation Device 3448
 |           +-0b.1  Intel Corporation Device 3448
 |           +-0b.2  Intel Corporation Device 344b
 |           +-0c.0  Intel Corporation Device 344a
 |           +-0d.0  Intel Corporation Device 344a
 |           +-0e.0  Intel Corporation Device 344a
 |           +-0f.0  Intel Corporation Device 344a
 |           +-1a.0  Intel Corporation Device 2880
 |           +-1b.0  Intel Corporation Device 2880
 |           +-1c.0  Intel Corporation Device 2880
 |           \-1d.0  Intel Corporation Device 2880
 +-[0000:e2]-+-00.0  Intel Corporation Device 09a2
 |           +-00.1  Intel Corporation Device 09a4
 |           +-00.2  Intel Corporation Device 09a3
 |           +-00.4  Intel Corporation Device 0998
 |           +-02.0-[e3]--
 |           +-03.0-[e4]--
 |           +-04.0-[e5]--
 |           \-05.0-[e6]--
 +-[0000:c9]-+-00.0  Intel Corporation Device 09a2
 |           +-00.1  Intel Corporation Device 09a4
 |           +-00.2  Intel Corporation Device 09a3
 |           \-00.4  Intel Corporation Device 0998
 +-[0000:b0]-+-00.0  Intel Corporation Device 09a2
 |           +-00.1  Intel Corporation Device 09a4
 |           +-00.2  Intel Corporation Device 09a3
 |           \-00.4  Intel Corporation Device 0998
 +-[0000:97]-+-00.0  Intel Corporation Device 09a2
 |           +-00.1  Intel Corporation Device 09a4
 |           +-00.2  Intel Corporation Device 09a3
 |           +-00.4  Intel Corporation Device 0998
 |           +-02.0-[98]----00.0  Phison Electronics Corporation E18 PCIe4 NVMe Controller
 |           +-03.0-[99]----00.0  Phison Electronics Corporation E18 PCIe4 NVMe Controller
 |           +-04.0-[9a]----00.0  Phison Electronics Corporation E18 PCIe4 NVMe Controller
 |           \-05.0-[9b]----00.0  Phison Electronics Corporation E18 PCIe4 NVMe Controller
 +-[0000:80]-+-00.0  Intel Corporation Device 09a2
 |           +-00.1  Intel Corporation Device 09a4
 |           +-00.2  Intel Corporation Device 09a3
 |           +-00.4  Intel Corporation Device 0998
 |           +-01.0  Intel Corporation Device 0b00
 |           +-01.1  Intel Corporation Device 0b00
 |           +-01.2  Intel Corporation Device 0b00
 |           +-01.3  Intel Corporation Device 0b00
 |           +-01.4  Intel Corporation Device 0b00
 |           +-01.5  Intel Corporation Device 0b00
 |           +-01.6  Intel Corporation Device 0b00
 |           +-01.7  Intel Corporation Device 0b00
 |           +-02.0  Intel Corporation Device 09a6
 |           +-02.1  Intel Corporation Device 09a7
 |           \-02.4  Intel Corporation Device 3456
 +-[0000:7f]-+-00.0  Intel Corporation Device 344c
 |           +-00.1  Intel Corporation Device 344c
 |           +-00.2  Intel Corporation Device 344c
 |           +-00.3  Intel Corporation Device 344c
 |           +-00.4  Intel Corporation Device 344c
 |           +-00.5  Intel Corporation Device 344c
 |           +-00.6  Intel Corporation Device 344c
 |           +-00.7  Intel Corporation Device 344c
 |           +-01.0  Intel Corporation Device 344c
 |           +-01.1  Intel Corporation Device 344c
 |           +-01.2  Intel Corporation Device 344c
 |           +-01.3  Intel Corporation Device 344c
 |           +-01.4  Intel Corporation Device 344c
 |           +-01.5  Intel Corporation Device 344c
 |           +-01.6  Intel Corporation Device 344c
 |           +-01.7  Intel Corporation Device 344c
 |           +-02.0  Intel Corporation Device 344c
 |           +-02.1  Intel Corporation Device 344c
 |           +-02.2  Intel Corporation Device 344c
 |           +-02.3  Intel Corporation Device 344c
 |           +-02.4  Intel Corporation Device 344c
 |           +-02.5  Intel Corporation Device 344c
 |           +-02.6  Intel Corporation Device 344c
 |           +-02.7  Intel Corporation Device 344c
 |           +-03.0  Intel Corporation Device 344c
 |           +-03.1  Intel Corporation Device 344c
 |           +-03.2  Intel Corporation Device 344c
 |           +-03.3  Intel Corporation Device 344c
 |           +-0a.0  Intel Corporation Device 344d
 |           +-0a.1  Intel Corporation Device 344d
 |           +-0a.2  Intel Corporation Device 344d
 |           +-0a.3  Intel Corporation Device 344d
 |           +-0a.4  Intel Corporation Device 344d
 |           +-0a.5  Intel Corporation Device 344d
 |           +-0a.6  Intel Corporation Device 344d
 |           +-0a.7  Intel Corporation Device 344d
 |           +-0b.0  Intel Corporation Device 344d
 |           +-0b.1  Intel Corporation Device 344d
 |           +-0b.2  Intel Corporation Device 344d
 |           +-0b.3  Intel Corporation Device 344d
 |           +-0b.4  Intel Corporation Device 344d
 |           +-0b.5  Intel Corporation Device 344d
 |           +-0b.6  Intel Corporation Device 344d
 |           +-0b.7  Intel Corporation Device 344d
 |           +-0c.0  Intel Corporation Device 344d
 |           +-0c.1  Intel Corporation Device 344d
 |           +-0c.2  Intel Corporation Device 344d
 |           +-0c.3  Intel Corporation Device 344d
 |           +-0c.4  Intel Corporation Device 344d
 |           +-0c.5  Intel Corporation Device 344d
 |           +-0c.6  Intel Corporation Device 344d
 |           +-0c.7  Intel Corporation Device 344d
 |           +-0d.0  Intel Corporation Device 344d
 |           +-0d.1  Intel Corporation Device 344d
 |           +-0d.2  Intel Corporation Device 344d
 |           +-0d.3  Intel Corporation Device 344d
 |           +-1d.0  Intel Corporation Device 344f
 |           +-1d.1  Intel Corporation Device 3457
 |           +-1e.0  Intel Corporation Device 3458
 |           +-1e.1  Intel Corporation Device 3459
 |           +-1e.2  Intel Corporation Device 345a
 |           +-1e.3  Intel Corporation Device 345b
 |           +-1e.4  Intel Corporation Device 345c
 |           +-1e.5  Intel Corporation Device 345d
 |           +-1e.6  Intel Corporation Device 345e
 |           \-1e.7  Intel Corporation Device 345f
 +-[0000:7e]-+-00.0  Intel Corporation Device 3450
 |           +-00.1  Intel Corporation Device 3451
 |           +-00.2  Intel Corporation Device 3452
 |           +-00.3  Intel Corporation Device 0998
 |           +-00.5  Intel Corporation Device 3455
 |           +-02.0  Intel Corporation Device 3440
 |           +-02.1  Intel Corporation Device 3441
 |           +-02.2  Intel Corporation Device 3442
 |           +-04.0  Intel Corporation Device 3440
 |           +-04.1  Intel Corporation Device 3441
 |           +-04.2  Intel Corporation Device 3442
 |           +-04.3  Intel Corporation Device 3443
 |           +-05.0  Intel Corporation Device 3445
 |           +-05.1  Intel Corporation Device 3446
 |           +-05.2  Intel Corporation Device 3447
 |           +-06.0  Intel Corporation Device 3445
 |           +-06.1  Intel Corporation Device 3446
 |           +-06.2  Intel Corporation Device 3447
 |           +-07.0  Intel Corporation Device 3445
 |           +-07.1  Intel Corporation Device 3446
 |           +-07.2  Intel Corporation Device 3447
 |           +-0b.0  Intel Corporation Device 3448
 |           +-0b.1  Intel Corporation Device 3448
 |           +-0b.2  Intel Corporation Device 344b
 |           +-0c.0  Intel Corporation Device 344a
 |           +-0d.0  Intel Corporation Device 344a
 |           +-0e.0  Intel Corporation Device 344a
 |           +-0f.0  Intel Corporation Device 344a
 |           +-1a.0  Intel Corporation Device 2880
 |           +-1b.0  Intel Corporation Device 2880
 |           +-1c.0  Intel Corporation Device 2880
 |           \-1d.0  Intel Corporation Device 2880
 +-[0000:64]-+-00.0  Intel Corporation Device 09a2
 |           +-00.1  Intel Corporation Device 09a4
 |           +-00.2  Intel Corporation Device 09a3
 |           +-00.4  Intel Corporation Device 0998
 |           +-02.0-[65-6a]--
 |           +-03.0-[6b-70]--
 |           +-04.0-[71-76]--
 |           \-05.0-[77-7c]--
 +-[0000:4a]-+-00.0  Intel Corporation Device 09a2
 |           +-00.1  Intel Corporation Device 09a4
 |           +-00.2  Intel Corporation Device 09a3
 |           +-00.4  Intel Corporation Device 0998
 |           \-02.0-[4b]--+-00.0  NVIDIA Corporation Device 27b2
 |                        \-00.1  NVIDIA Corporation Device 22bc
 +-[0000:30]-+-00.0  Intel Corporation Device 09a2
 |           +-00.1  Intel Corporation Device 09a4
 |           +-00.2  Intel Corporation Device 09a3
 |           +-00.4  Intel Corporation Device 0998
 |           \-02.0-[31]--+-00.0  NVIDIA Corporation Device 27b2
 |                        \-00.1  NVIDIA Corporation Device 22bc
 +-[0000:16]-+-00.0  Intel Corporation Device 09a2
 |           +-00.1  Intel Corporation Device 09a4
 |           +-00.2  Intel Corporation Device 09a3
 |           +-00.4  Intel Corporation Device 0998
 |           \-04.0-[17-1c]--
 \-[0000:00]-+-00.0  Intel Corporation Device 09a2
             +-00.1  Intel Corporation Device 09a4
             +-00.2  Intel Corporation Device 09a3
             +-00.4  Intel Corporation Device 0998
             +-01.0  Intel Corporation Device 0b00
             +-01.1  Intel Corporation Device 0b00
             +-01.2  Intel Corporation Device 0b00
             +-01.3  Intel Corporation Device 0b00
             +-01.4  Intel Corporation Device 0b00
             +-01.5  Intel Corporation Device 0b00
             +-01.6  Intel Corporation Device 0b00
             +-01.7  Intel Corporation Device 0b00
             +-02.0  Intel Corporation Device 09a6
             +-02.1  Intel Corporation Device 09a7
             +-02.4  Intel Corporation Device 3456
             +-11.0  Intel Corporation C620 Series Chipset Family MROM 0
             +-11.1  Intel Corporation C620 Series Chipset Family MROM 1
             +-11.5  Intel Corporation C620 Series Chipset Family SSATA Controller [AHCI mode]
             +-14.0  Intel Corporation C620 Series Chipset Family USB 3.0 xHCI Controller
             +-14.2  Intel Corporation C620 Series Chipset Family Thermal Subsystem
             +-16.0  Intel Corporation C620 Series Chipset Family MEI Controller #1
             +-16.1  Intel Corporation C620 Series Chipset Family MEI Controller #2
             +-16.4  Intel Corporation C620 Series Chipset Family MEI Controller #3
             +-17.0  Intel Corporation C620 Series Chipset Family SATA Controller [AHCI mode]
             +-1c.0-[01]--+-00.0  Intel Corporation I350 Gigabit Network Connection
             |            \-00.1  Intel Corporation I350 Gigabit Network Connection
             +-1c.4-[02-03]----00.0-[03]----00.0  ASPEED Technology, Inc. ASPEED Graphics Family
             +-1c.5-[04]--
             +-1f.0  Intel Corporation Device a1cb
             +-1f.2  Intel Corporation C620 Series Chipset Family Power Management Controller
             +-1f.4  Intel Corporation C620 Series Chipset Family SMBus
             \-1f.5  Intel Corporation C620 Series Chipset Family SPI Controller

NVIDIA DEVICES
31:00.0 VGA compatible controller: NVIDIA Corporation Device 27b2 (rev a1)
31:00.1 Audio device: NVIDIA Corporation Device 22bc (rev a1)
4b:00.0 VGA compatible controller: NVIDIA Corporation Device 27b2 (rev a1)
4b:00.1 Audio device: NVIDIA Corporation Device 22bc (rev a1)


==============================
PYTHON ENVIRONMENT
==============================
Python 3.10.12

Installed packages (torch, vllm, transformers)
torch                                    2.9.0
torch_c_dlpack_ext                       0.1.5
torchaudio                               2.9.0
torchvision                              0.24.0
transformers                             4.57.6
vllm                                     0.11.1
lmcache version 
0.3.10
Torch version
2.9.0+cu128

vLLM version
0.11.1

NCCL version
(2, 27, 5)

---

export LMCACHE_CONFIG_FILE=/path/to/config.yaml

python -m vllm.entrypoints.openai.api_server \
--model /usr/local/models/phi-4 \
--tensor-parallel-size 2 \
--gpu-memory-utilization 0.9 \
--max-model-len 16384 \
--enable-chunked-prefill \
--kv-cache-dtype auto \
--host 0.0.0.0 \
--port 8000 \
--trust-remote-code \
--enable-prefix-caching \
--kv-transfer-config '{"kv_connector":"LMCacheConnectorV1","kv_role":"kv_both"}'
RAW_BUFFERClick to expand / collapse

Environment Information

<details> <summary>The output of <code>python collect_env.py</code></summary>
==============================
SYSTEM INFORMATION
==============================
Thursday 12 March 2026 10:23:11 AM IST
Linux MW83-RP0-000 6.8.0-101-generic #101~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Feb 11 13:19:54 UTC  x86_64 x86_64 x86_64 GNU/Linux

OS RELEASE
PRETTY_NAME="Ubuntu 22.04.5 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.5 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

CPU INFO
Architecture:                            x86_64
CPU op-mode(s):                          32-bit, 64-bit
Address sizes:                           46 bits physical, 57 bits virtual
Byte Order:                              Little Endian
CPU(s):                                  48
On-line CPU(s) list:                     0-47
Vendor ID:                               GenuineIntel
Model name:                              Intel(R) Xeon(R) Silver 4310 CPU @ 2.10GHz
CPU family:                              6
Model:                                   106
Thread(s) per core:                      2
Core(s) per socket:                      12
Socket(s):                               2
Stepping:                                6
CPU max MHz:                             3300.0000
CPU min MHz:                             800.0000
BogoMIPS:                                4200.00
Flags:                                   fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local split_lock_detect wbnoinvd dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq la57 rdpid fsrm md_clear pconfig flush_l1d arch_capabilities
Virtualization:                          VT-x
L1d cache:                               1.1 MiB (24 instances)
L1i cache:                               768 KiB (24 instances)
L2 cache:                                30 MiB (24 instances)
L3 cache:                                36 MiB (2 instances)
NUMA node(s):                            2
NUMA node0 CPU(s):                       0-11,24-35
NUMA node1 CPU(s):                       12-23,36-47
Vulnerability Gather data sampling:      Mitigation; Microcode
Vulnerability Indirect target selection: Vulnerable
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; Enhanced / Automatic IBRS; IBPB conditional; PBRSB-eIBRS SW sequence; BHI SW loop, KVM SW loop
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

==============================
GPU INFORMATION
==============================

nvidia-smi
Thu Mar 12 10:23:11 2026       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.126.09             Driver Version: 580.126.09     CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX 4000 Ada Gene...    Off |   00000000:31:00.0 Off |                  Off |
| 30%   48C    P8              4W /  130W |      15MiB /  20475MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA RTX 4000 Ada Gene...    Off |   00000000:4B:00.0 Off |                  Off |
| 30%   49C    P8              4W /  130W |      15MiB /  20475MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            1819      G   /usr/lib/xorg/Xorg                        4MiB |
|    1   N/A  N/A            1819      G   /usr/lib/xorg/Xorg                        4MiB |
+-----------------------------------------------------------------------------------------+

GPU LIST
GPU 0: NVIDIA RTX 4000 Ada Generation (UUID: GPU-edbe6c9c-a6f6-ccc9-e5ea-bb588bac21d7)
GPU 1: NVIDIA RTX 4000 Ada Generation (UUID: GPU-f9af3eac-ac48-a861-c5d5-9d3ed87954b6)

GPU TOPOLOGY
	GPU0	GPU1	CPU Affinity	NUMA Affinity	GPU NUMA ID
GPU0	 X 	NODE	0-11,24-35	0		N/A
GPU1	NODE	 X 	0-11,24-35	0		N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

GPU PCI BUS INFO
name, driver_version, pci.bus_id
NVIDIA RTX 4000 Ada Generation, 580.126.09, 00000000:31:00.0
NVIDIA RTX 4000 Ada Generation, 580.126.09, 00000000:4B:00.0

==============================
PCIE TOPOLOGY
==============================
-+-[0000:ff]-+-00.0  Intel Corporation Device 344c
 |           +-00.1  Intel Corporation Device 344c
 |           +-00.2  Intel Corporation Device 344c
 |           +-00.3  Intel Corporation Device 344c
 |           +-00.4  Intel Corporation Device 344c
 |           +-00.5  Intel Corporation Device 344c
 |           +-00.6  Intel Corporation Device 344c
 |           +-00.7  Intel Corporation Device 344c
 |           +-01.0  Intel Corporation Device 344c
 |           +-01.1  Intel Corporation Device 344c
 |           +-01.2  Intel Corporation Device 344c
 |           +-01.3  Intel Corporation Device 344c
 |           +-01.4  Intel Corporation Device 344c
 |           +-01.5  Intel Corporation Device 344c
 |           +-01.6  Intel Corporation Device 344c
 |           +-01.7  Intel Corporation Device 344c
 |           +-02.0  Intel Corporation Device 344c
 |           +-02.1  Intel Corporation Device 344c
 |           +-02.2  Intel Corporation Device 344c
 |           +-02.3  Intel Corporation Device 344c
 |           +-02.4  Intel Corporation Device 344c
 |           +-02.5  Intel Corporation Device 344c
 |           +-02.6  Intel Corporation Device 344c
 |           +-02.7  Intel Corporation Device 344c
 |           +-03.0  Intel Corporation Device 344c
 |           +-03.1  Intel Corporation Device 344c
 |           +-03.2  Intel Corporation Device 344c
 |           +-03.3  Intel Corporation Device 344c
 |           +-0a.0  Intel Corporation Device 344d
 |           +-0a.1  Intel Corporation Device 344d
 |           +-0a.2  Intel Corporation Device 344d
 |           +-0a.3  Intel Corporation Device 344d
 |           +-0a.4  Intel Corporation Device 344d
 |           +-0a.5  Intel Corporation Device 344d
 |           +-0a.6  Intel Corporation Device 344d
 |           +-0a.7  Intel Corporation Device 344d
 |           +-0b.0  Intel Corporation Device 344d
 |           +-0b.1  Intel Corporation Device 344d
 |           +-0b.2  Intel Corporation Device 344d
 |           +-0b.3  Intel Corporation Device 344d
 |           +-0b.4  Intel Corporation Device 344d
 |           +-0b.5  Intel Corporation Device 344d
 |           +-0b.6  Intel Corporation Device 344d
 |           +-0b.7  Intel Corporation Device 344d
 |           +-0c.0  Intel Corporation Device 344d
 |           +-0c.1  Intel Corporation Device 344d
 |           +-0c.2  Intel Corporation Device 344d
 |           +-0c.3  Intel Corporation Device 344d
 |           +-0c.4  Intel Corporation Device 344d
 |           +-0c.5  Intel Corporation Device 344d
 |           +-0c.6  Intel Corporation Device 344d
 |           +-0c.7  Intel Corporation Device 344d
 |           +-0d.0  Intel Corporation Device 344d
 |           +-0d.1  Intel Corporation Device 344d
 |           +-0d.2  Intel Corporation Device 344d
 |           +-0d.3  Intel Corporation Device 344d
 |           +-1d.0  Intel Corporation Device 344f
 |           +-1d.1  Intel Corporation Device 3457
 |           +-1e.0  Intel Corporation Device 3458
 |           +-1e.1  Intel Corporation Device 3459
 |           +-1e.2  Intel Corporation Device 345a
 |           +-1e.3  Intel Corporation Device 345b
 |           +-1e.4  Intel Corporation Device 345c
 |           +-1e.5  Intel Corporation Device 345d
 |           +-1e.6  Intel Corporation Device 345e
 |           \-1e.7  Intel Corporation Device 345f
 +-[0000:fe]-+-00.0  Intel Corporation Device 3450
 |           +-00.1  Intel Corporation Device 3451
 |           +-00.2  Intel Corporation Device 3452
 |           +-00.3  Intel Corporation Device 0998
 |           +-00.5  Intel Corporation Device 3455
 |           +-02.0  Intel Corporation Device 3440
 |           +-02.1  Intel Corporation Device 3441
 |           +-02.2  Intel Corporation Device 3442
 |           +-04.0  Intel Corporation Device 3440
 |           +-04.1  Intel Corporation Device 3441
 |           +-04.2  Intel Corporation Device 3442
 |           +-04.3  Intel Corporation Device 3443
 |           +-05.0  Intel Corporation Device 3445
 |           +-05.1  Intel Corporation Device 3446
 |           +-05.2  Intel Corporation Device 3447
 |           +-06.0  Intel Corporation Device 3445
 |           +-06.1  Intel Corporation Device 3446
 |           +-06.2  Intel Corporation Device 3447
 |           +-07.0  Intel Corporation Device 3445
 |           +-07.1  Intel Corporation Device 3446
 |           +-07.2  Intel Corporation Device 3447
 |           +-0b.0  Intel Corporation Device 3448
 |           +-0b.1  Intel Corporation Device 3448
 |           +-0b.2  Intel Corporation Device 344b
 |           +-0c.0  Intel Corporation Device 344a
 |           +-0d.0  Intel Corporation Device 344a
 |           +-0e.0  Intel Corporation Device 344a
 |           +-0f.0  Intel Corporation Device 344a
 |           +-1a.0  Intel Corporation Device 2880
 |           +-1b.0  Intel Corporation Device 2880
 |           +-1c.0  Intel Corporation Device 2880
 |           \-1d.0  Intel Corporation Device 2880
 +-[0000:e2]-+-00.0  Intel Corporation Device 09a2
 |           +-00.1  Intel Corporation Device 09a4
 |           +-00.2  Intel Corporation Device 09a3
 |           +-00.4  Intel Corporation Device 0998
 |           +-02.0-[e3]--
 |           +-03.0-[e4]--
 |           +-04.0-[e5]--
 |           \-05.0-[e6]--
 +-[0000:c9]-+-00.0  Intel Corporation Device 09a2
 |           +-00.1  Intel Corporation Device 09a4
 |           +-00.2  Intel Corporation Device 09a3
 |           \-00.4  Intel Corporation Device 0998
 +-[0000:b0]-+-00.0  Intel Corporation Device 09a2
 |           +-00.1  Intel Corporation Device 09a4
 |           +-00.2  Intel Corporation Device 09a3
 |           \-00.4  Intel Corporation Device 0998
 +-[0000:97]-+-00.0  Intel Corporation Device 09a2
 |           +-00.1  Intel Corporation Device 09a4
 |           +-00.2  Intel Corporation Device 09a3
 |           +-00.4  Intel Corporation Device 0998
 |           +-02.0-[98]----00.0  Phison Electronics Corporation E18 PCIe4 NVMe Controller
 |           +-03.0-[99]----00.0  Phison Electronics Corporation E18 PCIe4 NVMe Controller
 |           +-04.0-[9a]----00.0  Phison Electronics Corporation E18 PCIe4 NVMe Controller
 |           \-05.0-[9b]----00.0  Phison Electronics Corporation E18 PCIe4 NVMe Controller
 +-[0000:80]-+-00.0  Intel Corporation Device 09a2
 |           +-00.1  Intel Corporation Device 09a4
 |           +-00.2  Intel Corporation Device 09a3
 |           +-00.4  Intel Corporation Device 0998
 |           +-01.0  Intel Corporation Device 0b00
 |           +-01.1  Intel Corporation Device 0b00
 |           +-01.2  Intel Corporation Device 0b00
 |           +-01.3  Intel Corporation Device 0b00
 |           +-01.4  Intel Corporation Device 0b00
 |           +-01.5  Intel Corporation Device 0b00
 |           +-01.6  Intel Corporation Device 0b00
 |           +-01.7  Intel Corporation Device 0b00
 |           +-02.0  Intel Corporation Device 09a6
 |           +-02.1  Intel Corporation Device 09a7
 |           \-02.4  Intel Corporation Device 3456
 +-[0000:7f]-+-00.0  Intel Corporation Device 344c
 |           +-00.1  Intel Corporation Device 344c
 |           +-00.2  Intel Corporation Device 344c
 |           +-00.3  Intel Corporation Device 344c
 |           +-00.4  Intel Corporation Device 344c
 |           +-00.5  Intel Corporation Device 344c
 |           +-00.6  Intel Corporation Device 344c
 |           +-00.7  Intel Corporation Device 344c
 |           +-01.0  Intel Corporation Device 344c
 |           +-01.1  Intel Corporation Device 344c
 |           +-01.2  Intel Corporation Device 344c
 |           +-01.3  Intel Corporation Device 344c
 |           +-01.4  Intel Corporation Device 344c
 |           +-01.5  Intel Corporation Device 344c
 |           +-01.6  Intel Corporation Device 344c
 |           +-01.7  Intel Corporation Device 344c
 |           +-02.0  Intel Corporation Device 344c
 |           +-02.1  Intel Corporation Device 344c
 |           +-02.2  Intel Corporation Device 344c
 |           +-02.3  Intel Corporation Device 344c
 |           +-02.4  Intel Corporation Device 344c
 |           +-02.5  Intel Corporation Device 344c
 |           +-02.6  Intel Corporation Device 344c
 |           +-02.7  Intel Corporation Device 344c
 |           +-03.0  Intel Corporation Device 344c
 |           +-03.1  Intel Corporation Device 344c
 |           +-03.2  Intel Corporation Device 344c
 |           +-03.3  Intel Corporation Device 344c
 |           +-0a.0  Intel Corporation Device 344d
 |           +-0a.1  Intel Corporation Device 344d
 |           +-0a.2  Intel Corporation Device 344d
 |           +-0a.3  Intel Corporation Device 344d
 |           +-0a.4  Intel Corporation Device 344d
 |           +-0a.5  Intel Corporation Device 344d
 |           +-0a.6  Intel Corporation Device 344d
 |           +-0a.7  Intel Corporation Device 344d
 |           +-0b.0  Intel Corporation Device 344d
 |           +-0b.1  Intel Corporation Device 344d
 |           +-0b.2  Intel Corporation Device 344d
 |           +-0b.3  Intel Corporation Device 344d
 |           +-0b.4  Intel Corporation Device 344d
 |           +-0b.5  Intel Corporation Device 344d
 |           +-0b.6  Intel Corporation Device 344d
 |           +-0b.7  Intel Corporation Device 344d
 |           +-0c.0  Intel Corporation Device 344d
 |           +-0c.1  Intel Corporation Device 344d
 |           +-0c.2  Intel Corporation Device 344d
 |           +-0c.3  Intel Corporation Device 344d
 |           +-0c.4  Intel Corporation Device 344d
 |           +-0c.5  Intel Corporation Device 344d
 |           +-0c.6  Intel Corporation Device 344d
 |           +-0c.7  Intel Corporation Device 344d
 |           +-0d.0  Intel Corporation Device 344d
 |           +-0d.1  Intel Corporation Device 344d
 |           +-0d.2  Intel Corporation Device 344d
 |           +-0d.3  Intel Corporation Device 344d
 |           +-1d.0  Intel Corporation Device 344f
 |           +-1d.1  Intel Corporation Device 3457
 |           +-1e.0  Intel Corporation Device 3458
 |           +-1e.1  Intel Corporation Device 3459
 |           +-1e.2  Intel Corporation Device 345a
 |           +-1e.3  Intel Corporation Device 345b
 |           +-1e.4  Intel Corporation Device 345c
 |           +-1e.5  Intel Corporation Device 345d
 |           +-1e.6  Intel Corporation Device 345e
 |           \-1e.7  Intel Corporation Device 345f
 +-[0000:7e]-+-00.0  Intel Corporation Device 3450
 |           +-00.1  Intel Corporation Device 3451
 |           +-00.2  Intel Corporation Device 3452
 |           +-00.3  Intel Corporation Device 0998
 |           +-00.5  Intel Corporation Device 3455
 |           +-02.0  Intel Corporation Device 3440
 |           +-02.1  Intel Corporation Device 3441
 |           +-02.2  Intel Corporation Device 3442
 |           +-04.0  Intel Corporation Device 3440
 |           +-04.1  Intel Corporation Device 3441
 |           +-04.2  Intel Corporation Device 3442
 |           +-04.3  Intel Corporation Device 3443
 |           +-05.0  Intel Corporation Device 3445
 |           +-05.1  Intel Corporation Device 3446
 |           +-05.2  Intel Corporation Device 3447
 |           +-06.0  Intel Corporation Device 3445
 |           +-06.1  Intel Corporation Device 3446
 |           +-06.2  Intel Corporation Device 3447
 |           +-07.0  Intel Corporation Device 3445
 |           +-07.1  Intel Corporation Device 3446
 |           +-07.2  Intel Corporation Device 3447
 |           +-0b.0  Intel Corporation Device 3448
 |           +-0b.1  Intel Corporation Device 3448
 |           +-0b.2  Intel Corporation Device 344b
 |           +-0c.0  Intel Corporation Device 344a
 |           +-0d.0  Intel Corporation Device 344a
 |           +-0e.0  Intel Corporation Device 344a
 |           +-0f.0  Intel Corporation Device 344a
 |           +-1a.0  Intel Corporation Device 2880
 |           +-1b.0  Intel Corporation Device 2880
 |           +-1c.0  Intel Corporation Device 2880
 |           \-1d.0  Intel Corporation Device 2880
 +-[0000:64]-+-00.0  Intel Corporation Device 09a2
 |           +-00.1  Intel Corporation Device 09a4
 |           +-00.2  Intel Corporation Device 09a3
 |           +-00.4  Intel Corporation Device 0998
 |           +-02.0-[65-6a]--
 |           +-03.0-[6b-70]--
 |           +-04.0-[71-76]--
 |           \-05.0-[77-7c]--
 +-[0000:4a]-+-00.0  Intel Corporation Device 09a2
 |           +-00.1  Intel Corporation Device 09a4
 |           +-00.2  Intel Corporation Device 09a3
 |           +-00.4  Intel Corporation Device 0998
 |           \-02.0-[4b]--+-00.0  NVIDIA Corporation Device 27b2
 |                        \-00.1  NVIDIA Corporation Device 22bc
 +-[0000:30]-+-00.0  Intel Corporation Device 09a2
 |           +-00.1  Intel Corporation Device 09a4
 |           +-00.2  Intel Corporation Device 09a3
 |           +-00.4  Intel Corporation Device 0998
 |           \-02.0-[31]--+-00.0  NVIDIA Corporation Device 27b2
 |                        \-00.1  NVIDIA Corporation Device 22bc
 +-[0000:16]-+-00.0  Intel Corporation Device 09a2
 |           +-00.1  Intel Corporation Device 09a4
 |           +-00.2  Intel Corporation Device 09a3
 |           +-00.4  Intel Corporation Device 0998
 |           \-04.0-[17-1c]--
 \-[0000:00]-+-00.0  Intel Corporation Device 09a2
             +-00.1  Intel Corporation Device 09a4
             +-00.2  Intel Corporation Device 09a3
             +-00.4  Intel Corporation Device 0998
             +-01.0  Intel Corporation Device 0b00
             +-01.1  Intel Corporation Device 0b00
             +-01.2  Intel Corporation Device 0b00
             +-01.3  Intel Corporation Device 0b00
             +-01.4  Intel Corporation Device 0b00
             +-01.5  Intel Corporation Device 0b00
             +-01.6  Intel Corporation Device 0b00
             +-01.7  Intel Corporation Device 0b00
             +-02.0  Intel Corporation Device 09a6
             +-02.1  Intel Corporation Device 09a7
             +-02.4  Intel Corporation Device 3456
             +-11.0  Intel Corporation C620 Series Chipset Family MROM 0
             +-11.1  Intel Corporation C620 Series Chipset Family MROM 1
             +-11.5  Intel Corporation C620 Series Chipset Family SSATA Controller [AHCI mode]
             +-14.0  Intel Corporation C620 Series Chipset Family USB 3.0 xHCI Controller
             +-14.2  Intel Corporation C620 Series Chipset Family Thermal Subsystem
             +-16.0  Intel Corporation C620 Series Chipset Family MEI Controller #1
             +-16.1  Intel Corporation C620 Series Chipset Family MEI Controller #2
             +-16.4  Intel Corporation C620 Series Chipset Family MEI Controller #3
             +-17.0  Intel Corporation C620 Series Chipset Family SATA Controller [AHCI mode]
             +-1c.0-[01]--+-00.0  Intel Corporation I350 Gigabit Network Connection
             |            \-00.1  Intel Corporation I350 Gigabit Network Connection
             +-1c.4-[02-03]----00.0-[03]----00.0  ASPEED Technology, Inc. ASPEED Graphics Family
             +-1c.5-[04]--
             +-1f.0  Intel Corporation Device a1cb
             +-1f.2  Intel Corporation C620 Series Chipset Family Power Management Controller
             +-1f.4  Intel Corporation C620 Series Chipset Family SMBus
             \-1f.5  Intel Corporation C620 Series Chipset Family SPI Controller

NVIDIA DEVICES
31:00.0 VGA compatible controller: NVIDIA Corporation Device 27b2 (rev a1)
31:00.1 Audio device: NVIDIA Corporation Device 22bc (rev a1)
4b:00.0 VGA compatible controller: NVIDIA Corporation Device 27b2 (rev a1)
4b:00.1 Audio device: NVIDIA Corporation Device 22bc (rev a1)


==============================
PYTHON ENVIRONMENT
==============================
Python 3.10.12

Installed packages (torch, vllm, transformers)
torch                                    2.9.0
torch_c_dlpack_ext                       0.1.5
torchaudio                               2.9.0
torchvision                              0.24.0
transformers                             4.57.6
vllm                                     0.11.1
lmcache version 
0.3.10
Torch version
2.9.0+cu128

vLLM version
0.11.1

NCCL version
(2, 27, 5)
</details>

🐛 Describe the bug

The GPUs are entering an error state during the second model load when running vLLM with prefix caching and KV transfer enabled.

--enable-prefix-caching --kv-transfer-config '{"kv_connector":"LMCacheConnectorV1","kv_role":"kv_both"}'

After a system reboot, the model loads successfully and runs without any issues at first time . However, if i try to load again for the second time , the GPUs go into an error state during the second initialization.

<img width="1138" height="756" alt="Image" src="https://github.com/user-attachments/assets/654ce25f-405c-4633-89a0-292f04216ac5" />

This behavior is consistently reproducible.

Steps to Reproduce

  1. Run the following command (model loads successfully at first time).
  2. unload the model
  3. Run the same command again.
  4. During the second load, the GPUs enter an error state. Command Used
export LMCACHE_CONFIG_FILE=/path/to/config.yaml

python -m vllm.entrypoints.openai.api_server \
--model /usr/local/models/phi-4 \
--tensor-parallel-size 2 \
--gpu-memory-utilization 0.9 \
--max-model-len 16384 \
--enable-chunked-prefill \
--kv-cache-dtype auto \
--host 0.0.0.0 \
--port 8000 \
--trust-remote-code \
--enable-prefix-caching \
--kv-transfer-config '{"kv_connector":"LMCacheConnectorV1","kv_role":"kv_both"}'

below is my config.yaml chunk_size: 256 local_cpu: false local_disk: /mnt/nvme0/ max_local_disk_size: 20

If the same command is executed after reboot, it works for the first time . The issue only appears when loading the model a second time without rebooting.

below is the error i am getting 2026-03-05T12:23:14.561Z - WARN: vLLM Server stderr (PID 84695): [rank1]:[E305 17:53:14.167809819 ProcessGroupNCCL.cpp:2057] [PG ID 2 PG GUID 3 Rank 1] Process group watchdog thread terminated with exception: CUDA error: unspecified launch failure Search for cudaErrorLaunchFailure' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.

Exception raised from c10_cuda_check_implementation at /pytorch/c10/cuda/CUDAException.cpp:44 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x80 (0x76975933fb80 in /opt/vllm-venv/lib/python3.10/site-packages/torch/lib/libc10.so) frame #1: <unknown function> + 0x11fb7 (0x7697b8566fb7 in /opt/vllm-venv/lib/python3.10/site-packages/torch/lib/libc10_cuda.so) frame #2: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x50 (0x76975a20ab60 in /opt/vllm-venv/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so) frame #3: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x68 (0x76975a21a0e8 in /opt/vllm-venv/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so) frame #4: c10d::ProcessGroupNCCL::Watchdog::runLoop() + 0x969 (0x76975a21e2e9 in /opt/vllm-venv/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so) frame #5: c10d::ProcessGroupNCCL::Watchdog::run() + 0xdf (0x76975a22025f in /opt/vllm-venv/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so) frame #6: <unknown function> + 0xdc253 (0x7697b10b0253 in /lib/x86_64-linux-gnu/libstdc++.so.6) frame #7: <unknown function> + 0x94ac3 (0x7697b9204ac3 in /lib/x86_64-linux-gnu/libc.so.6) frame #8: <unknown function> + 0x126850 (0x7697b9296850 in /lib/x86_64-linux-gnu/libc.so.6) 2026-03-05T12:23:14.562Z - WARN: vLLM Server stderr (PID 84695): terminate called after throwing an instance of 'c10::DistBackendError' 2026-03-05T12:23:14.564Z - WARN: vLLM Server stderr (PID 84695): what(): [PG ID 2 PG GUID 3 Rank 1] Process group watchdog thread terminated with exception: CUDA error: unspecified launch failure Search for cudaErrorLaunchFailure' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.

Exception raised from c10_cuda_check_implementation at /pytorch/c10/cuda/CUDAException.cpp:44 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x80 (0x76975933fb80 in /opt/vllm-venv/lib/python3.10/site-packages/torch/lib/libc10.so) frame #1: <unknown function> + 0x11fb7 (0x7697b8566fb7 in /opt/vllm-venv/lib/python3.10/site-packages/torch/lib/libc10_cuda.so) frame #2: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x50 (0x76975a20ab60 in /opt/vllm-venv/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so) frame #3: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0x68 (0x76975a21a0e8 in /opt/vllm-venv/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so) frame #4: c10d::ProcessGroupNCCL::Watchdog::runLoop() + 0x969 (0x76975a21e2e9 in /opt/vllm-venv/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so) frame #5: c10d::ProcessGroupNCCL::Watchdog::run() + 0xdf (0x76975a22025f in /opt/vllm-venv/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so) frame #6: <unknown function> + 0xdc253 (0x7697b10b0253 in /lib/x86_64-linux-gnu/libstdc++.so.6) frame #7: <unknown function> + 0x94ac3 (0x7697b9204ac3 in /lib/x86_64-linux-gnu/libc.so.6) frame #8: <unknown function> + 0x126850 (0x7697b9296850 in /lib/x86_64-linux-gnu/libc.so.6)

Exception raised from run at /pytorch/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:2063 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x80 (0x76975933fb80 in /opt/vllm-venv/lib/python3.10/site-packages/torch/lib/libc10.so) frame #1: <unknown function> + 0xe336d1 (0x76975a1f66d1 in /opt/vllm-venv/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so) frame #2: <unknown function> + 0x95044f (0x769759d1344f in /opt/vllm-venv/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so) frame #3: <unknown function> + 0xdc253 (0x7697b10b0253 in /lib/x86_64-linux-gnu/libstdc++.so.6) frame #4: <unknown function> + 0x94ac3 (0x7697b9204ac3 in /lib/x86_64-linux-gnu/libc.so.6) frame #5: <unknown function> + 0x126850 (0x7697b9296850 in /lib/x86_64-linux-gnu/libc.so.6)

Can anyone help me with the issue .I have tried export NCCL_P2P_DISABLE=1 too but didnt work . what is the exact issue . Is this issue related to hardware topology (GPU communication over PCIe root ports) or could it be related to KV transfer initialization / NCCL communication inside vLLM?

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

The issue seems to be related to the GPU communication over PCIe root ports and NCCL communication inside vLLM. To fix this issue, you can try the following steps:

  • Disable P2P communication: Although you've tried export NCCL_P2P_DISABLE=1, you can also try setting NCCL_P2P_LEVEL to 2 or 3 to see if it makes a difference.
  • Set NCCL socket IF: Try setting NCCL_SOCKET_IF to the name of the network interface that is connected to the GPU, for example, export NCCL_SOCKET_IF=eth0.
  • Set NCCL tree threshold: You can try setting NCCL_TREE_THRESHOLD to `` to force NCCL to use the tree communication pattern.
  • Update NCCL version: Make sure you're using the latest version of NCCL. You can check the version by running nccl --version.
  • Check GPU topology: Verify that the GPU topology is correct and that the GPUs are connected to the same PCIe root port.

Here's an example of how you can set these environment variables in your command:

export NCCL_P2P_DISABLE=1
export NCCL_P2P_LEVEL=2
export NCCL_SOCKET_IF=eth0
export NCCL_TREE_THRESHOLD=python -m vllm.entrypoints.openai.api_server \
--model /usr/local/models/phi-4 \
--tensor-parallel-size 2 \
--gpu-memory-utilization 0.9 \
--max-model-len

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - ✅(Solved) Fix [Bug]: GPU failure during repeated model loading when using --enable-prefix-caching with KV transfer (LMCacheConnectorV1) [1 pull requests, 4 comments, 2 participants]