vllm - ✅(Solved) Fix [Bug]: CUDA 13 LMCache KV connector install path still resolves CUDA 12 artifacts [2 pull requests, 1 comments, 1 participants]

malaiwah · 2026-03-22T12:16:29Z

[vllm] The current LMCache KV-connector installation path in vllm/vllm-openai:latest-cu130 is not compatible with the latest LMCache integration stack. In a CU… The current LMCache KV-connector installation path in `vllm/vllm-openai:latest-cu130` is not compatible with the latest LMCache integration stack. In a CUDA 13 container derived from `vllm/vllm-openai:latest-cu130`, enabling the LMCache connector can fail before serving starts because the install path still pulls CUDA 12-oriented artifacts. # PR #37802: [CI/Build] Fix LMCache KV connector install path for CUDA 13 images - Repository: vllm-project/vllm - Author: malaiwah - State: open | merged: False - Link: https://github.com/vllm-project/vllm/pull/37802 ## Description (problem / solution / changelog) ## Purpose Fixes #37801. This updates the CUDA 13 KV-connector install path so `vllm/vllm-openai:latest-cu130` can be layered with LMCache without pulling CUDA 12-oriented artifacts. Related cross-project issue: LMCache/LMCache#2843. This is not duplicating an existing PR. Before opening this PR I checked for open vLLM PRs/issues covering this CUDA 13 LMCache packaging fix and did not find an open PR addressing the same path. Concretely, the CUDA 13 branch now: - resolves `nixl` to `nixl-cu13` - skips the generic `lmcache` wheel - installs `cupy-cuda13x` - rebuilds LMCache from source with `--no-build-isolation --no-deps` The CUDA 12 path remains unchanged. ## Test Plan - `pytest -q --noconftest tests/tools/test_resolve_kv_connector_requirements.py` - build/runtime validation of the CUDA 12 KV-connector path - build/runtime validation of a CUDA 13 image layered on top of `vllm/vllm-openai:latest-cu130` ## Test Result Targeted tests: ```text $ pytest -q --noconftest tests/tools/test_resolve_kv_connector_requirements.py 8 passed ``` CUDA 12 validation: - verified the default CUDA 12 branch still resolves/install the existing CUDA 12 package set - runtime import of `lmcache.integration.vllm.vllm_v1_adapter` succeeded - end-to-end LMCache validation succeeded with CPU+disk backends, disk offload, and replay hits CUDA 13 validation: - built a CUDA 13 image on top of `vllm/vllm-openai:latest-cu130` - served `Qwen/Qwen3-0.6B` - enabled `LMCacheConnectorV1` - confirmed `LocalCPUBackend` and `LocalDiskBackend` creation - confirmed disk offload writes - confirmed replay hits - confirmed OpenAI-compatible responses expose `usage.prompt_tokens_details.cached_tokens` on replay when both: - `--enable-prompt-tokens-details` - `kv_connector_extra_config.enable_cache_usage_details_in_response=true` Representative runtime result: - first replay request: `prompt_tokens_details = null` - second replay request: `prompt_tokens_details.cached_tokens = 6144` ## AI Assistance This PR was prepared with AI assistance. I reviewed every changed line, reproduced the problem, ran the listed tests, and validated the runtime behavior end-to-end myself. ## Changed files - `docker/Dockerfile` (modified, +42/-9) - `tests/tools/test_resolve_kv_connector_requirements.py` (added, +167/-0) - `tools/resolve_kv_connector_requirements.py` (added, +88/-0) --- # PR #2844: [Build] Support CUDA 13 Docker builds by rebuilding LMCache from source - Repository: LMCache/LMCache - Author: malaiwah - State: open | merged: False - Link: https://github.com/LMCache/LMCache/pull/2844 ## Description (problem / solution / changelog) **What this PR does / why we need it**: Fixes #2843. This makes LMCache's Docker/runtime path explicitly support CUDA 13 for the latest vLLM OpenAI image path. Related cross-project issue: vllm-project/vllm#37801. In the CUDA 13 branch, this PR: - switches runtime dependencies to `cupy-cuda13x` and `nixl-cu13` - avoids the generic `lmcache` wheel for CUDA 13 - rebuilds/installs LMCache from source with `--no-build-isolation --no-deps` - keeps the existing CUDA 12 behavior intact **Special notes for your reviewers**: - Base branch is `dev`, per the contribution guide. - This change is intentionally scoped to Docker/runtime compatibility and tests. - The paired vLLM-side fix is tracked in vllm-project/vllm#37801. - CUDA 12 was revalidated after this change so the default build path remains covered. - I used AI assistance while preparing this change, and I manually reviewed the patch and validated the listed tests/results. **If applicable**: - [x] this PR contains user facing changes - docs added - [x] this PR contains unit tests ## Test evidence Targeted tests: ```text $ pytest -q --noconftest tests/test_dockerfile_cuda13_contract.py 3 passed ``` Docs sanity: ```text $ sphinx -b dummy docs/source .sphinx-dummy succeeded ``` CUDA 12 validation: - built `docker/Dockerfile` `image-release` - runtime import of `lmcache.integration.vllm.vllm_v1_adapter` succeeded - LMCache CPU+disk offload and replay hits succeeded CUDA 13 validation: - built a CUDA 13 image on top of `vllm/vllm-openai:latest-cu130` - served `Qwen/Qwen3-0.6B` - confirmed `LocalCPUBackend` and `LocalDiskBackend` - confirme

vllm2026-03-22 12:16:29

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#37801•Fetched 2026-04-08 01:12:53

View on GitHub

Comments

Participants

Timeline

Reactions

Author

malaiwah

Participants

malaiwah

Timeline (top)

cross-referenced ×3commented ×1

The current LMCache KV-connector installation path in vllm/vllm-openai:latest-cu130 is not compatible with the latest LMCache integration stack.

In a CUDA 13 container derived from vllm/vllm-openai:latest-cu130, enabling the LMCache connector can fail before serving starts because the install path still pulls CUDA 12-oriented artifacts.

Error Message

ImportError: libcudart.so.12: cannot open shared object file

Root Cause

In a CUDA 13 container derived from vllm/vllm-openai:latest-cu130, enabling the LMCache connector can fail before serving starts because the install path still pulls CUDA 12-oriented artifacts.

Fix Action

Fixed

Fixed by PR: [CI/Build] Fix LMCache KV connector install path for CUDA 13 images (https://github.com/vllm-project/vllm/pull/37802)
Fixed by PR: [Build] Support CUDA 13 Docker builds by rebuilding LMCache from source (https://github.com/LMCache/LMCache/pull/2844)

PR fix notes

PR #37802: [CI/Build] Fix LMCache KV connector install path for CUDA 13 images

Repository: vllm-project/vllm
Author: malaiwah
State: open | merged: False
Link: https://github.com/vllm-project/vllm/pull/37802

Description (problem / solution / changelog)

Purpose

Fixes #37801.

This updates the CUDA 13 KV-connector install path so vllm/vllm-openai:latest-cu130 can be layered with LMCache without pulling CUDA 12-oriented artifacts.

Related cross-project issue: LMCache/LMCache#2843.

This is not duplicating an existing PR. Before opening this PR I checked for open vLLM PRs/issues covering this CUDA 13 LMCache packaging fix and did not find an open PR addressing the same path.

Concretely, the CUDA 13 branch now:

resolves nixl to nixl-cu13
skips the generic lmcache wheel
installs cupy-cuda13x
rebuilds LMCache from source with --no-build-isolation --no-deps

The CUDA 12 path remains unchanged.

Test Plan

pytest -q --noconftest tests/tools/test_resolve_kv_connector_requirements.py
build/runtime validation of the CUDA 12 KV-connector path
build/runtime validation of a CUDA 13 image layered on top of vllm/vllm-openai:latest-cu130

Test Result

Targeted tests:

$ pytest -q --noconftest tests/tools/test_resolve_kv_connector_requirements.py
8 passed

CUDA 12 validation:

verified the default CUDA 12 branch still resolves/install the existing CUDA 12 package set
runtime import of lmcache.integration.vllm.vllm_v1_adapter succeeded
end-to-end LMCache validation succeeded with CPU+disk backends, disk offload, and replay hits

CUDA 13 validation:

built a CUDA 13 image on top of vllm/vllm-openai:latest-cu130
served Qwen/Qwen3-0.6B
enabled LMCacheConnectorV1
confirmed LocalCPUBackend and LocalDiskBackend creation
confirmed disk offload writes
confirmed replay hits
confirmed OpenAI-compatible responses expose usage.prompt_tokens_details.cached_tokens on replay when both:
- --enable-prompt-tokens-details
- kv_connector_extra_config.enable_cache_usage_details_in_response=true

Representative runtime result:

first replay request: prompt_tokens_details = null
second replay request: prompt_tokens_details.cached_tokens = 6144

AI Assistance

This PR was prepared with AI assistance. I reviewed every changed line, reproduced the problem, ran the listed tests, and validated the runtime behavior end-to-end myself.

Changed files

docker/Dockerfile (modified, +42/-9)
tests/tools/test_resolve_kv_connector_requirements.py (added, +167/-0)
tools/resolve_kv_connector_requirements.py (added, +88/-0)

PR #2844: [Build] Support CUDA 13 Docker builds by rebuilding LMCache from source

Repository: LMCache/LMCache
Author: malaiwah
State: open | merged: False
Link: https://github.com/LMCache/LMCache/pull/2844

Description (problem / solution / changelog)

What this PR does / why we need it:

Fixes #2843.

This makes LMCache's Docker/runtime path explicitly support CUDA 13 for the latest vLLM OpenAI image path.

Related cross-project issue: vllm-project/vllm#37801.

In the CUDA 13 branch, this PR:

switches runtime dependencies to cupy-cuda13x and nixl-cu13
avoids the generic lmcache wheel for CUDA 13
rebuilds/installs LMCache from source with --no-build-isolation --no-deps
keeps the existing CUDA 12 behavior intact

Special notes for your reviewers:

Base branch is dev, per the contribution guide.
This change is intentionally scoped to Docker/runtime compatibility and tests.
The paired vLLM-side fix is tracked in vllm-project/vllm#37801.
CUDA 12 was revalidated after this change so the default build path remains covered.
I used AI assistance while preparing this change, and I manually reviewed the patch and validated the listed tests/results.

If applicable:

this PR contains user facing changes - docs added
this PR contains unit tests

Test evidence

Targeted tests:

$ pytest -q --noconftest tests/test_dockerfile_cuda13_contract.py
3 passed

Docs sanity:

$ sphinx -b dummy docs/source .sphinx-dummy
succeeded

CUDA 12 validation:

built docker/Dockerfile image-release
runtime import of lmcache.integration.vllm.vllm_v1_adapter succeeded
LMCache CPU+disk offload and replay hits succeeded

CUDA 13 validation:

built a CUDA 13 image on top of vllm/vllm-openai:latest-cu130
served Qwen/Qwen3-0.6B
confirmed LocalCPUBackend and LocalDiskBackend
confirmed disk offload writes
confirmed replay cache hits
confirmed OpenAI-compatible replay responses exposed usage.prompt_tokens_details.cached_tokens = 6144

Changed files

.github/workflows/test.yml (modified, +2/-1)
README.md (modified, +5/-2)
docker/Dockerfile (modified, +66/-7)
docker/README.md (modified, +5/-2)
docs/source/developer_guide/docker_file.rst (modified, +9/-0)
docs/source/getting_started/installation.rst (modified, +21/-1)
docs/source/production/docker_deployment.rst (modified, +7/-1)
tests/test_dockerfile_cuda13_contract.py (added, +73/-0)

Code Example

ImportError: libcudart.so.12: cannot open shared object file

RAW_BUFFERClick to expand / collapse

Summary

The current LMCache KV-connector installation path in vllm/vllm-openai:latest-cu130 is not compatible with the latest LMCache integration stack.

In a CUDA 13 container derived from vllm/vllm-openai:latest-cu130, enabling the LMCache connector can fail before serving starts because the install path still pulls CUDA 12-oriented artifacts.

Reproduction

Environment used for validation:

Podman
NVIDIA RTX 4080 SUPER
driver 595.45.04
host CUDA 13.2
base image vllm/vllm-openai:latest-cu130

Minimal reproduction shape:

Start from vllm/vllm-openai:latest-cu130.
Install LMCache through the current KV-connector path.
Import lmcache.integration.vllm.vllm_v1_adapter or start vllm serve with LMCacheConnectorV1.

Observed failure:

ImportError: libcudart.so.12: cannot open shared object file

Expected behavior

The CUDA 13 image path should install CUDA 13-compatible KV-connector dependencies so that LMCache can be imported and used without falling back to CUDA 12 artifacts.

Root cause hypothesis

The CUDA 13 image path still assumes the generic / CUDA 12-oriented LMCache packaging path. In practice, the working CUDA 13 combination required:

nixl-cu13 instead of the generic nixl package
cupy-cuda13x
skipping the generic lmcache wheel in the CUDA 13 branch
rebuilding LMCache from source against the image-local CUDA/Torch stack

Validation already performed

Targeted tests:

pytest -q --noconftest tests/tools/test_resolve_kv_connector_requirements.py
Result: 8 passed

CUDA 12 guard validation:

verified that the default CUDA 12 branch still resolves and installs the existing CUDA 12-oriented package set
validated runtime LMCache import and disk offload/hit behavior on a CUDA 12 image

CUDA 13 end-to-end validation:

built a CUDA 13 image on top of vllm/vllm-openai:latest-cu130
served Qwen/Qwen3-0.6B
enabled LMCacheConnectorV1
confirmed LocalCPUBackend and LocalDiskBackend creation
confirmed disk offload writes
confirmed replay cache hits
confirmed OpenAI-compatible responses expose usage.prompt_tokens_details.cached_tokens when both:
- --enable-prompt-tokens-details
- kv_connector_extra_config.enable_cache_usage_details_in_response=true

Representative runtime result:

first replayed request: prompt_tokens_details = null
second replayed request: prompt_tokens_details.cached_tokens = 6144

Cross-project context

This needs a matching LMCache-side fix as well because LMCache's Docker/runtime path also needs an explicit CUDA 13 package/source-build branch.

Related LMCache issue: LMCache/LMCache#2843

extent analysis

Fix Plan

To resolve the compatibility issue with the LMCache KV-connector in the vllm/vllm-openai:latest-cu130 image, follow these steps:

Update the Dockerfile:
- Use nixl-cu13 instead of the generic nixl package.
- Install cupy-cuda13x.
- Skip the generic lmcache wheel in the CUDA 13 branch.
- Rebuild LMCache from source against the image-local CUDA/Torch stack.
Modify the KV-connector installation path:
- Point to the CUDA 13-compatible dependencies.

Example Dockerfile modifications:

# Use CUDA 13-compatible packages
RUN pip install nixl-cu13 cupy-cuda13x

# Rebuild LMCache from source for CUDA 13 compatibility
RUN git clone https://github.com/LMCache/LMCache.git && \
    cd LMCache && \
    python setup.py install

Configure the LMCache connector:
- Ensure that the LMCacheConnectorV1 is configured to use the CUDA 13-compatible dependencies.

Example configuration:

import lmcache

# Configure the LMCache connector
lmcache_config = {
    'kv_connector': 'LMCacheConnectorV1',
    'kv_connector_extra_config': {
        'enable_cache_usage_details_in_response': True
    }
}

Verification

To verify that the fix worked:

Build a new Docker image with the updated Dockerfile.
Run the image and enable the LMCacheConnectorV1.
Test the LMCache functionality, including disk offload and replay cache hits.
Verify that the usage.prompt_tokens_details.cached_tokens field is exposed in the response when --enable-prompt-tokens-details is enabled.

Extra Tips

Ensure that the LMCache-side fix is also applied, as mentioned in the related LMCache issue (LMCache/LMCache#2843).
Test the fix thoroughly to ensure that it works as expected in different scenarios.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

The CUDA 13 image path should install CUDA 13-compatible KV-connector dependencies so that LMCache can be imported and used without falling back to CUDA 12 artifacts.

#installation #environment variable #network issue #logging issue #cache issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

vllm - ✅(Solved) Fix [Bug]: CUDA 13 LMCache KV connector install path still resolves CUDA 12 artifacts [2 pull requests, 1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

PR fix notes

PR #37802: [CI/Build] Fix LMCache KV connector install path for CUDA 13 images

Description (problem / solution / changelog)

Purpose

Test Plan

Test Result

AI Assistance

Changed files

PR #2844: [Build] Support CUDA 13 Docker builds by rebuilding LMCache from source

Description (problem / solution / changelog)

Test evidence

Changed files

Code Example

Summary

Reproduction

Expected behavior

Root cause hypothesis

Validation already performed

Cross-project context

extent analysis

Fix Plan

Verification

Extra Tips

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING