vllm - ✅(Solved) Fix [Feature]: Add ROCm support for simple offload connector [1 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

PR fix notes

PR #40549: [ROCm] Enable SimpleCPUOffloadConnector on ROCm

Description (problem / solution / changelog)

Purpose

Fix https://github.com/vllm-project/vllm/issues/40397

Enable SimpleCPUOffloadConnector on ROCm backend.

Test Plan

(1) unit test: vllm/tests/v1/simple_kv_offload# pytest test_scheduler.py

(2) integration test:

vllm/tests/v1/simple_kv_offload# pytest test_integration.py

Test Result

(1) unit test: pass root@xxx:/app/vllm/tests/v1/simple_kv_offload# pytest test_scheduler.py
================================================================================================================== test session starts =================================================================================================================== platform linux -- Python 3.12.13, pytest-9.0.3, pluggy-1.6.0 rootdir: /app/vllm configfile: pyproject.toml plugins: asyncio-1.3.0, anyio-4.13.0 asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function collected 12 items

test_scheduler.py ............ [100%]

==================================================================================================================== warnings summary ==================================================================================================================== <frozen importlib._bootstrap>:488 <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no module attribute

<frozen importlib._bootstrap>:488 <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no module attribute

tests/v1/simple_kv_offload/test_scheduler.py: 14 warnings /usr/local/lib/python3.12/dist-packages/torch/jit/_script.py:362: DeprecationWarning: torch.jit.script_method is deprecated. Please switch to torch.compile or torch.export. warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html ============================================================================================================ 12 passed, 16 warnings in 12.12s ============================================================================================================

(2) integration test: by default, skipped. pytest test_integration.py
======================================================================================================================== test session starts ======================================================================================================================== platform linux -- Python 3.12.13, pytest-9.0.3, pluggy-1.6.0 rootdir: /app/vllm configfile: pyproject.toml plugins: asyncio-1.3.0, anyio-4.13.0 asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function collected 8 items

test_integration.py ssssssss [100%]

========================================================================================================================= warnings summary ========================================================================================================================== <frozen importlib._bootstrap>:488 <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no module attribute

<frozen importlib._bootstrap>:488 <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no module attribute

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html ================================================================================================================== 8 skipped, 2 warnings in 0.04s ===================================================================================================================

(3) model test using the command from the associated issue

export VLLM_USE_SIMPLE_KV_OFFLOAD=1
export VLLM_ROCM_USE_AITER=1
export VLLM_ROCM_USE_AITER_MOE=0

vllm serve amd/DeepSeek-R1-0528-MXFP4-Preview --host 0.0.0.0 --port 8889  --gpu-memory-utilization 0.9 --tensor-parallel-size 8 --kv_offloading_backend native --kv_offloading_size 600 --no-disable-hybrid-kv-cache-manager

server started without exception:

...
(APIServer pid=28377) INFO:     Started server process [28377]
(APIServer pid=28377) INFO:     Waiting for application startup.
(APIServer pid=28377) INFO:     Application startup complete.

<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
</details>

Changed files

  • csrc/cache_kernels.cu (modified, +27/-9)
  • tests/v1/simple_kv_offload/test_integration.py (modified, +2/-2)
  • tests/v1/simple_kv_offload/test_scheduler.py (modified, +7/-1)
  • vllm/v1/simple_kv_offload/cuda_mem_ops.py (modified, +58/-11)
RAW_BUFFERClick to expand / collapse

🚀 The feature, motivation and pitch

Currently, SimpleCPUOffloadConnector path requires CUDA. Support should be added for ROCm as well.

Alternatives

No response

Additional context

Stack trace when launching with simple offload connector w/ ROCm (MI355X):

error.txt

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

Add support for ROCm in the SimpleCPUOffloadConnector path to enable its use without requiring CUDA.

Guidance

  • Review the current implementation of SimpleCPUOffloadConnector to identify CUDA-specific dependencies and determine the necessary modifications for ROCm compatibility.
  • Investigate the stack trace provided in error.txt to understand the specific errors encountered when attempting to launch with the simple offload connector and ROCm.
  • Research ROCm documentation and examples to understand its API and integration requirements, which will be essential for implementing the necessary support in SimpleCPUOffloadConnector.
  • Consider creating a separate branch or fork to develop and test the ROCm support without disrupting the main codebase.

Example

No specific code example can be provided without more context on the current implementation of SimpleCPUOffloadConnector and the exact requirements for ROCm support.

Notes

The solution will depend on the specific details of the SimpleCPUOffloadConnector implementation and the ROCm API. Additional research and testing will be necessary to ensure seamless integration.

Recommendation

Apply workaround: Modify SimpleCPUOffloadConnector to support ROCm, as this will enable the desired functionality without waiting for an upgrade or external fix.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - ✅(Solved) Fix [Feature]: Add ROCm support for simple offload connector [1 pull requests]