vllm - 💡(How to fix) Fix [CI Failure]: mi355_1: Language Models Test (PPL) [3 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#37596Fetched 2026-04-08 01:04:30
View on GitHub
Comments
3
Participants
2
Timeline
16
Reactions
0
Timeline (top)
mentioned ×4subscribed ×4commented ×3added_to_project_v2 ×2

Error Message

There is currently an error: :0:rocdevice.cpp :3675: 1143953758580 us: Callback: Queue 0x7eb85a100000 aborting with error : HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION: The agent attempted to access memory beyond the largest legal address. code: 0x29 Even with enforce_eager set to True error happens.

Root Cause

  • Flaky test
  • Can reproduce locally
  • Caused by external libraries (e.g. bug in transformers)

Code Example

(EngineCore pid=1578061) INFO 03-19 20:12:53 [gpu_model_runner.py:4601] Model loading took 1.59 GiB memory and 1.009335 seconds
(EngineCore pid=1578061) /app/vllm/vllm/model_executor/layers/utils.py:188: UserWarning: Failed validator: GCN_ARCH_NAME (Triggered internally at /app/pytorch/aten/src/ATen/hip/tunable/Tunable.cpp:364.)
(EngineCore pid=1578061)   return torch.nn.functional.linear(x, weight, bias)
:0:rocdevice.cpp            :3675: 1143953758580 us:  Callback: Queue 0x7eb85a100000 aborting with error : HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION: The agent attempted to access memory beyond the largest legal address. code: 0x29
FAILED
RAW_BUFFERClick to expand / collapse

Name of failing test

pytest -s -v tests/models/language/generation_ppl_test/test_qwen.py::test_ppl[model_info2]

Basic information

  • Flaky test
  • Can reproduce locally
  • Caused by external libraries (e.g. bug in transformers)

🧪 Describe the failing test

There is currently an error:

(EngineCore pid=1578061) INFO 03-19 20:12:53 [gpu_model_runner.py:4601] Model loading took 1.59 GiB memory and 1.009335 seconds
(EngineCore pid=1578061) /app/vllm/vllm/model_executor/layers/utils.py:188: UserWarning: Failed validator: GCN_ARCH_NAME (Triggered internally at /app/pytorch/aten/src/ATen/hip/tunable/Tunable.cpp:364.)
(EngineCore pid=1578061)   return torch.nn.functional.linear(x, weight, bias)
:0:rocdevice.cpp            :3675: 1143953758580 us:  Callback: Queue 0x7eb85a100000 aborting with error : HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION: The agent attempted to access memory beyond the largest legal address. code: 0x29
FAILED

It's reproducible by: HIP_VISIBLE_DEVICES=7 pytest -s -v tests/models/language/generation_ppl_test/test_qwen.py::test_ppl[model_info2]

Even with enforce_eager set to True error happens.

📝 History of failing test

https://buildkite.com/vllm/amd-ci/builds/6670/steps/canvas?sid=019d04ae-43ff-4774-a779-76136bc12e90&tab=output

extent analysis

Fix Plan

The fix involves addressing the memory aperture violation error by optimizing memory usage in the model execution.

  • Step 1: Reduce Model Size
    • Review the model architecture and reduce the size of the model if possible.
    • This can be done by reducing the number of layers, the number of parameters in each layer, or using a more efficient architecture.
  • Step 2: Optimize Memory Allocation
    • Ensure that memory allocation is optimized for the GPU.
    • This can be done by using torch.cuda.empty_cache() to release any unused memory before running the model.
    • Example:

import torch

Release unused memory

torch.cuda.empty_cache()

Run the model

model_output = model(input_data)

*   **Step 3: Use Gradient Checkpointing**
    *   Implement gradient checkpointing to reduce memory usage during backpropagation.
    *   This can be done using `torch.utils.checkpoint` module.
    *   Example:
        ```python
import torch
from torch.utils.checkpoint import checkpoint

# Define a checkpointed function
def checkpointed_function(x):
    return checkpoint(model, x)

# Run the checkpointed function
model_output = checkpointed_function(input_data)
  • Step 4: Increase GPU Memory
    • If possible, increase the GPU memory available to the model.
    • This can be done by using a GPU with more memory or by reducing the batch size.

Verification

To verify that the fix worked, run the test again with the optimized model and memory allocation:

HIP_VISIBLE_DEVICES=7 pytest -s -v tests/models/language/generation_ppl_test/test_qwen.py::test_ppl[model_info2]

If the test passes without any memory errors, the fix is successful.

Extra Tips

  • Monitor GPU memory usage during model execution to identify potential memory bottlenecks.
  • Use tools like nvidia-smi to monitor GPU memory usage.
  • Consider using mixed precision training to reduce memory usage.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING