PR fix notes

PR #40549: [ROCm] Enable SimpleCPUOffloadConnector on ROCm

Repository: vllm-project/vllm
Author: hongxiayang
State: open | merged: False
Link: https://github.com/vllm-project/vllm/pull/40549

Description (problem / solution / changelog)

Purpose

Fix https://github.com/vllm-project/vllm/issues/40397

Enable SimpleCPUOffloadConnector on ROCm backend.

Test Plan

(1) unit test: vllm/tests/v1/simple_kv_offload# pytest test_scheduler.py

(2) integration test:

vllm/tests/v1/simple_kv_offload# pytest test_integration.py

Test Result

(1) unit test: pass root@xxx:/app/vllm/tests/v1/simple_kv_offload# pytest test_scheduler.py
================================================================================================================== test session starts =================================================================================================================== platform linux -- Python 3.12.13, pytest-9.0.3, pluggy-1.6.0 rootdir: /app/vllm configfile: pyproject.toml plugins: asyncio-1.3.0, anyio-4.13.0 asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function collected 12 items

test_scheduler.py ............ [100%]

==================================================================================================================== warnings summary ==================================================================================================================== <frozen importlib._bootstrap>:488 <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no module attribute

<frozen importlib._bootstrap>:488 <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no module attribute

tests/v1/simple_kv_offload/test_scheduler.py: 14 warnings /usr/local/lib/python3.12/dist-packages/torch/jit/_script.py:362: DeprecationWarning: torch.jit.script_method is deprecated. Please switch to torch.compile or torch.export. warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html ============================================================================================================ 12 passed, 16 warnings in 12.12s ============================================================================================================

(2) integration test: by default, skipped. pytest test_integration.py
======================================================================================================================== test session starts ======================================================================================================================== platform linux -- Python 3.12.13, pytest-9.0.3, pluggy-1.6.0 rootdir: /app/vllm configfile: pyproject.toml plugins: asyncio-1.3.0, anyio-4.13.0 asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function collected 8 items

test_integration.py ssssssss [100%]

========================================================================================================================= warnings summary ========================================================================================================================== <frozen importlib._bootstrap>:488 <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no module attribute

<frozen importlib._bootstrap>:488 <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no module attribute

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html ================================================================================================================== 8 skipped, 2 warnings in 0.04s ===================================================================================================================

(3) model test using the command from the associated issue

export VLLM_USE_SIMPLE_KV_OFFLOAD=1
export VLLM_ROCM_USE_AITER=1
export VLLM_ROCM_USE_AITER_MOE=0

vllm serve amd/DeepSeek-R1-0528-MXFP4-Preview --host 0.0.0.0 --port 8889  --gpu-memory-utilization 0.9 --tensor-parallel-size 8 --kv_offloading_backend native --kv_offloading_size 600 --no-disable-hybrid-kv-cache-manager

server started without exception:

...
(APIServer pid=28377) INFO:     Started server process [28377]
(APIServer pid=28377) INFO:     Waiting for application startup.
(APIServer pid=28377) INFO:     Application startup complete.

<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

</details>

Changed files

csrc/cache_kernels.cu (modified, +27/-9)
tests/v1/simple_kv_offload/test_integration.py (modified, +2/-2)
tests/v1/simple_kv_offload/test_scheduler.py (modified, +7/-1)
vllm/v1/simple_kv_offload/cuda_mem_ops.py (modified, +58/-11)

extent analysis

TL;DR

Add support for ROCm in the SimpleCPUOffloadConnector path to enable its use without requiring CUDA.

Guidance

Review the current implementation of SimpleCPUOffloadConnector to identify CUDA-specific dependencies and determine the necessary modifications for ROCm compatibility.
Investigate the stack trace provided in error.txt to understand the specific errors encountered when attempting to launch with the simple offload connector and ROCm.
Research ROCm documentation and examples to understand its API and integration requirements, which will be essential for implementing the necessary support in SimpleCPUOffloadConnector.
Consider creating a separate branch or fork to develop and test the ROCm support without disrupting the main codebase.

Example

No specific code example can be provided without more context on the current implementation of SimpleCPUOffloadConnector and the exact requirements for ROCm support.

Notes

The solution will depend on the specific details of the SimpleCPUOffloadConnector implementation and the ROCm API. Additional research and testing will be necessary to ensure seamless integration.

Recommendation

Apply workaround: Modify SimpleCPUOffloadConnector to support ROCm, as this will enable the desired functionality without waiting for an upgrade or external fix.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - ✅(Solved) Fix [Feature]: Add ROCm support for simple offload connector [1 pull requests]

Recommended Tools

GitHub issue graph ai analysis

Error Message

PR fix notes

PR #40549: [ROCm] Enable SimpleCPUOffloadConnector on ROCm

Description (problem / solution / changelog)

Purpose

Test Plan

Test Result

Changed files

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

vllm - ✅(Solved) Fix [Feature]: Add ROCm support for simple offload connector [1 pull requests]

Recommended Tools

GitHub issue graph ai analysis

Error Message

PR fix notes

PR #40549: [ROCm] Enable SimpleCPUOffloadConnector on ROCm

Description (problem / solution / changelog)

Purpose

Test Plan

Test Result

Changed files

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING