vllm - 💡(How to fix) Fix [CI Failure]: mi355_1: Language Models Test (PPL) [3 comments, 2 participants]

vllm2026-03-19 20:15:46

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#37596•Fetched 2026-04-08 01:04:30

View on GitHub

Comments

Participants

Timeline

Reactions

Author

AndreasKaratzas

Participants

AndreasKaratzas

github-actions[bot]

Timeline (top)

mentioned ×4subscribed ×4commented ×3added_to_project_v2 ×2

Error Message

There is currently an error: :0:rocdevice.cpp :3675: 1143953758580 us: Callback: Queue 0x7eb85a100000 aborting with error : HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION: The agent attempted to access memory beyond the largest legal address. code: 0x29 Even with enforce_eager set to True error happens.

Root Cause

Flaky test
Can reproduce locally
Caused by external libraries (e.g. bug in transformers)

Code Example

(EngineCore pid=1578061) INFO 03-19 20:12:53 [gpu_model_runner.py:4601] Model loading took 1.59 GiB memory and 1.009335 seconds
(EngineCore pid=1578061) /app/vllm/vllm/model_executor/layers/utils.py:188: UserWarning: Failed validator: GCN_ARCH_NAME (Triggered internally at /app/pytorch/aten/src/ATen/hip/tunable/Tunable.cpp:364.)
(EngineCore pid=1578061)   return torch.nn.functional.linear(x, weight, bias)
:0:rocdevice.cpp            :3675: 1143953758580 us:  Callback: Queue 0x7eb85a100000 aborting with error : HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION: The agent attempted to access memory beyond the largest legal address. code: 0x29
FAILED

RAW_BUFFERClick to expand / collapse

Name of failing test

pytest -s -v tests/models/language/generation_ppl_test/test_qwen.py::test_ppl[model_info2]

Basic information

Flaky test
Can reproduce locally
Caused by external libraries (e.g. bug in transformers)

🧪 Describe the failing test

There is currently an error:

(EngineCore pid=1578061) INFO 03-19 20:12:53 [gpu_model_runner.py:4601] Model loading took 1.59 GiB memory and 1.009335 seconds
(EngineCore pid=1578061) /app/vllm/vllm/model_executor/layers/utils.py:188: UserWarning: Failed validator: GCN_ARCH_NAME (Triggered internally at /app/pytorch/aten/src/ATen/hip/tunable/Tunable.cpp:364.)
(EngineCore pid=1578061)   return torch.nn.functional.linear(x, weight, bias)
:0:rocdevice.cpp            :3675: 1143953758580 us:  Callback: Queue 0x7eb85a100000 aborting with error : HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION: The agent attempted to access memory beyond the largest legal address. code: 0x29
FAILED

It's reproducible by: HIP_VISIBLE_DEVICES=7 pytest -s -v tests/models/language/generation_ppl_test/test_qwen.py::test_ppl[model_info2]

Even with enforce_eager set to True error happens.

📝 History of failing test

https://buildkite.com/vllm/amd-ci/builds/6670/steps/canvas?sid=019d04ae-43ff-4774-a779-76136bc12e90&tab=output

extent analysis

Fix Plan

The fix involves addressing the memory aperture violation error by optimizing memory usage in the model execution.

Step 1: Reduce Model Size
- Review the model architecture and reduce the size of the model if possible.
- This can be done by reducing the number of layers, the number of parameters in each layer, or using a more efficient architecture.
Step 2: Optimize Memory Allocation
- Ensure that memory allocation is optimized for the GPU.
- This can be done by using torch.cuda.empty_cache() to release any unused memory before running the model.
- Example:

import torch

Release unused memory

torch.cuda.empty_cache()

Run the model

model_output = model(input_data)

*   **Step 3: Use Gradient Checkpointing**
    *   Implement gradient checkpointing to reduce memory usage during backpropagation.
    *   This can be done using `torch.utils.checkpoint` module.
    *   Example:
        ```python
import torch
from torch.utils.checkpoint import checkpoint

# Define a checkpointed function
def checkpointed_function(x):
    return checkpoint(model, x)

# Run the checkpointed function
model_output = checkpointed_function(input_data)

Step 4: Increase GPU Memory
- If possible, increase the GPU memory available to the model.
- This can be done by using a GPU with more memory or by reducing the batch size.

Verification

To verify that the fix worked, run the test again with the optimized model and memory allocation:

HIP_VISIBLE_DEVICES=7 pytest -s -v tests/models/language/generation_ppl_test/test_qwen.py::test_ppl[model_info2]

If the test passes without any memory errors, the fix is successful.

Extra Tips

Monitor GPU memory usage during model execution to identify potential memory bottlenecks.
Use tools like nvidia-smi to monitor GPU memory usage.
Consider using mixed precision training to reduce memory usage.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #retriever error #indexing error #inference speed #output truncation #model loading

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - 💡(How to fix) Fix [CI Failure]: mi355_1: Language Models Test (PPL) [3 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Code Example

Name of failing test

Basic information

🧪 Describe the failing test

📝 History of failing test

extent analysis

Fix Plan

Release unused memory

Run the model

Verification

Extra Tips

Still need to ship something?

TRENDING

vllm - 💡(How to fix) Fix [CI Failure]: mi355_1: Language Models Test (PPL) [3 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Code Example

Name of failing test

Basic information

🧪 Describe the failing test

📝 History of failing test

extent analysis

Fix Plan

Release unused memory

Run the model

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING