vllm - ✅(Solved) Fix [Bug]: CUDA 13 LMCache KV connector install path still resolves CUDA 12 artifacts [2 pull requests, 1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#37801Fetched 2026-04-08 01:12:53
View on GitHub
Comments
1
Participants
1
Timeline
4
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×3commented ×1

The current LMCache KV-connector installation path in vllm/vllm-openai:latest-cu130 is not compatible with the latest LMCache integration stack.

In a CUDA 13 container derived from vllm/vllm-openai:latest-cu130, enabling the LMCache connector can fail before serving starts because the install path still pulls CUDA 12-oriented artifacts.

Error Message

ImportError: libcudart.so.12: cannot open shared object file

Root Cause

In a CUDA 13 container derived from vllm/vllm-openai:latest-cu130, enabling the LMCache connector can fail before serving starts because the install path still pulls CUDA 12-oriented artifacts.

Fix Action

Fixed

PR fix notes

PR #37802: [CI/Build] Fix LMCache KV connector install path for CUDA 13 images

Description (problem / solution / changelog)

Purpose

Fixes #37801.

This updates the CUDA 13 KV-connector install path so vllm/vllm-openai:latest-cu130 can be layered with LMCache without pulling CUDA 12-oriented artifacts.

Related cross-project issue: LMCache/LMCache#2843.

This is not duplicating an existing PR. Before opening this PR I checked for open vLLM PRs/issues covering this CUDA 13 LMCache packaging fix and did not find an open PR addressing the same path.

Concretely, the CUDA 13 branch now:

  • resolves nixl to nixl-cu13
  • skips the generic lmcache wheel
  • installs cupy-cuda13x
  • rebuilds LMCache from source with --no-build-isolation --no-deps

The CUDA 12 path remains unchanged.

Test Plan

  • pytest -q --noconftest tests/tools/test_resolve_kv_connector_requirements.py
  • build/runtime validation of the CUDA 12 KV-connector path
  • build/runtime validation of a CUDA 13 image layered on top of vllm/vllm-openai:latest-cu130

Test Result

Targeted tests:

$ pytest -q --noconftest tests/tools/test_resolve_kv_connector_requirements.py
8 passed

CUDA 12 validation:

  • verified the default CUDA 12 branch still resolves/install the existing CUDA 12 package set
  • runtime import of lmcache.integration.vllm.vllm_v1_adapter succeeded
  • end-to-end LMCache validation succeeded with CPU+disk backends, disk offload, and replay hits

CUDA 13 validation:

  • built a CUDA 13 image on top of vllm/vllm-openai:latest-cu130
  • served Qwen/Qwen3-0.6B
  • enabled LMCacheConnectorV1
  • confirmed LocalCPUBackend and LocalDiskBackend creation
  • confirmed disk offload writes
  • confirmed replay hits
  • confirmed OpenAI-compatible responses expose usage.prompt_tokens_details.cached_tokens on replay when both:
    • --enable-prompt-tokens-details
    • kv_connector_extra_config.enable_cache_usage_details_in_response=true

Representative runtime result:

  • first replay request: prompt_tokens_details = null
  • second replay request: prompt_tokens_details.cached_tokens = 6144

AI Assistance

This PR was prepared with AI assistance. I reviewed every changed line, reproduced the problem, ran the listed tests, and validated the runtime behavior end-to-end myself.

Changed files

  • docker/Dockerfile (modified, +42/-9)
  • tests/tools/test_resolve_kv_connector_requirements.py (added, +167/-0)
  • tools/resolve_kv_connector_requirements.py (added, +88/-0)

PR #2844: [Build] Support CUDA 13 Docker builds by rebuilding LMCache from source

Description (problem / solution / changelog)

What this PR does / why we need it:

Fixes #2843.

This makes LMCache's Docker/runtime path explicitly support CUDA 13 for the latest vLLM OpenAI image path.

Related cross-project issue: vllm-project/vllm#37801.

In the CUDA 13 branch, this PR:

  • switches runtime dependencies to cupy-cuda13x and nixl-cu13
  • avoids the generic lmcache wheel for CUDA 13
  • rebuilds/installs LMCache from source with --no-build-isolation --no-deps
  • keeps the existing CUDA 12 behavior intact

Special notes for your reviewers:

  • Base branch is dev, per the contribution guide.
  • This change is intentionally scoped to Docker/runtime compatibility and tests.
  • The paired vLLM-side fix is tracked in vllm-project/vllm#37801.
  • CUDA 12 was revalidated after this change so the default build path remains covered.
  • I used AI assistance while preparing this change, and I manually reviewed the patch and validated the listed tests/results.

If applicable:

  • this PR contains user facing changes - docs added
  • this PR contains unit tests

Test evidence

Targeted tests:

$ pytest -q --noconftest tests/test_dockerfile_cuda13_contract.py
3 passed

Docs sanity:

$ sphinx -b dummy docs/source .sphinx-dummy
succeeded

CUDA 12 validation:

  • built docker/Dockerfile image-release
  • runtime import of lmcache.integration.vllm.vllm_v1_adapter succeeded
  • LMCache CPU+disk offload and replay hits succeeded

CUDA 13 validation:

  • built a CUDA 13 image on top of vllm/vllm-openai:latest-cu130
  • served Qwen/Qwen3-0.6B
  • confirmed LocalCPUBackend and LocalDiskBackend
  • confirmed disk offload writes
  • confirmed replay cache hits
  • confirmed OpenAI-compatible replay responses exposed usage.prompt_tokens_details.cached_tokens = 6144

Changed files

  • .github/workflows/test.yml (modified, +2/-1)
  • README.md (modified, +5/-2)
  • docker/Dockerfile (modified, +66/-7)
  • docker/README.md (modified, +5/-2)
  • docs/source/developer_guide/docker_file.rst (modified, +9/-0)
  • docs/source/getting_started/installation.rst (modified, +21/-1)
  • docs/source/production/docker_deployment.rst (modified, +7/-1)
  • tests/test_dockerfile_cuda13_contract.py (added, +73/-0)

Code Example

ImportError: libcudart.so.12: cannot open shared object file
RAW_BUFFERClick to expand / collapse

Summary

The current LMCache KV-connector installation path in vllm/vllm-openai:latest-cu130 is not compatible with the latest LMCache integration stack.

In a CUDA 13 container derived from vllm/vllm-openai:latest-cu130, enabling the LMCache connector can fail before serving starts because the install path still pulls CUDA 12-oriented artifacts.

Reproduction

Environment used for validation:

  • Podman
  • NVIDIA RTX 4080 SUPER
  • driver 595.45.04
  • host CUDA 13.2
  • base image vllm/vllm-openai:latest-cu130

Minimal reproduction shape:

  1. Start from vllm/vllm-openai:latest-cu130.
  2. Install LMCache through the current KV-connector path.
  3. Import lmcache.integration.vllm.vllm_v1_adapter or start vllm serve with LMCacheConnectorV1.

Observed failure:

ImportError: libcudart.so.12: cannot open shared object file

Expected behavior

The CUDA 13 image path should install CUDA 13-compatible KV-connector dependencies so that LMCache can be imported and used without falling back to CUDA 12 artifacts.

Root cause hypothesis

The CUDA 13 image path still assumes the generic / CUDA 12-oriented LMCache packaging path. In practice, the working CUDA 13 combination required:

  • nixl-cu13 instead of the generic nixl package
  • cupy-cuda13x
  • skipping the generic lmcache wheel in the CUDA 13 branch
  • rebuilding LMCache from source against the image-local CUDA/Torch stack

Validation already performed

Targeted tests:

  • pytest -q --noconftest tests/tools/test_resolve_kv_connector_requirements.py
  • Result: 8 passed

CUDA 12 guard validation:

  • verified that the default CUDA 12 branch still resolves and installs the existing CUDA 12-oriented package set
  • validated runtime LMCache import and disk offload/hit behavior on a CUDA 12 image

CUDA 13 end-to-end validation:

  • built a CUDA 13 image on top of vllm/vllm-openai:latest-cu130
  • served Qwen/Qwen3-0.6B
  • enabled LMCacheConnectorV1
  • confirmed LocalCPUBackend and LocalDiskBackend creation
  • confirmed disk offload writes
  • confirmed replay cache hits
  • confirmed OpenAI-compatible responses expose usage.prompt_tokens_details.cached_tokens when both:
    • --enable-prompt-tokens-details
    • kv_connector_extra_config.enable_cache_usage_details_in_response=true

Representative runtime result:

  • first replayed request: prompt_tokens_details = null
  • second replayed request: prompt_tokens_details.cached_tokens = 6144

Cross-project context

This needs a matching LMCache-side fix as well because LMCache's Docker/runtime path also needs an explicit CUDA 13 package/source-build branch.

Related LMCache issue: LMCache/LMCache#2843

extent analysis

Fix Plan

To resolve the compatibility issue with the LMCache KV-connector in the vllm/vllm-openai:latest-cu130 image, follow these steps:

  1. Update the Dockerfile:

    • Use nixl-cu13 instead of the generic nixl package.
    • Install cupy-cuda13x.
    • Skip the generic lmcache wheel in the CUDA 13 branch.
    • Rebuild LMCache from source against the image-local CUDA/Torch stack.
  2. Modify the KV-connector installation path:

    • Point to the CUDA 13-compatible dependencies.

Example Dockerfile modifications:

# Use CUDA 13-compatible packages
RUN pip install nixl-cu13 cupy-cuda13x

# Rebuild LMCache from source for CUDA 13 compatibility
RUN git clone https://github.com/LMCache/LMCache.git && \
    cd LMCache && \
    python setup.py install
  1. Configure the LMCache connector:
    • Ensure that the LMCacheConnectorV1 is configured to use the CUDA 13-compatible dependencies.

Example configuration:

import lmcache

# Configure the LMCache connector
lmcache_config = {
    'kv_connector': 'LMCacheConnectorV1',
    'kv_connector_extra_config': {
        'enable_cache_usage_details_in_response': True
    }
}

Verification

To verify that the fix worked:

  1. Build a new Docker image with the updated Dockerfile.
  2. Run the image and enable the LMCacheConnectorV1.
  3. Test the LMCache functionality, including disk offload and replay cache hits.
  4. Verify that the usage.prompt_tokens_details.cached_tokens field is exposed in the response when --enable-prompt-tokens-details is enabled.

Extra Tips

  • Ensure that the LMCache-side fix is also applied, as mentioned in the related LMCache issue (LMCache/LMCache#2843).
  • Test the fix thoroughly to ensure that it works as expected in different scenarios.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

The CUDA 13 image path should install CUDA 13-compatible KV-connector dependencies so that LMCache can be imported and used without falling back to CUDA 12 artifacts.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - ✅(Solved) Fix [Bug]: CUDA 13 LMCache KV connector install path still resolves CUDA 12 artifacts [2 pull requests, 1 comments, 1 participants]