vllm - ✅(Solved) Fix [CI Failure]: Gemma3 OOMs with transformers backend [1 pull requests, 2 comments, 2 participants]

vllm2026-03-21 05:32:29

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#37736•Fetched 2026-04-08 01:08:28

View on GitHub

Comments

Participants

Timeline

Reactions

Author

AndreasKaratzas

Participants

AndreasKaratzas

github-actions[bot]

Timeline (top)

mentioned ×4subscribed ×4added_to_project_v2 ×3project_v2_item_status_changed ×3

Fix Action

Fixed

Fixed by PR: [ROCm][CI] Add large_gpu_mark to test_max_tokens_none for ROCm (https://github.com/vllm-project/vllm/pull/37717)

PR fix notes

PR #37717: [ROCm][CI] Add large_gpu_mark to test_max_tokens_none for ROCm

Repository: vllm-project/vllm
Author: AndreasKaratzas
State: closed | merged: True
Link: https://github.com/vllm-project/vllm/pull/37717

Description (problem / solution / changelog)

Follow-up for:

#34839

Marks max_tokens test with distilbert/distilgpt2 as a large GPU test. Addresses failure in mi250_1: Regression

Motivation: https://buildkite.com/vllm/amd-ci/builds/6721/steps/canvas?sid=019d09d4-708e-44b0-a0d0-ccf0e3c00a94&tab=output

cc @kenroche

Changed files

tests/test_regression.py (modified, +14/-2)

RAW_BUFFERClick to expand / collapse

Test group

mi250_1: Multi-Modal Models (Standard) 2: qwen3 + gemma

Describe the failing test

This is not exactly a test failure, but it has been recommended to investigate further the OOMing event of Gemma3, which is a 4B model. The intuition here is that the fake tensor that is used for profiling is large enough that exceeds the 64 GB of MI250 GPUs. However, it has been suggested that this is still weird.

📝 History of failing test

https://github.com/vllm-project/vllm/pull/37610#issuecomment-4102286515

CC List.

@DarkLight1337 (for transparency)

extent analysis

Fix Plan

To address the OOMing event of Gemma3, we will implement a solution to reduce the memory usage of the fake tensor used for profiling.

Steps to Fix

Reduce the size of the fake tensor to fit within the 64 GB memory limit of the MI250 GPUs.
Implement a more efficient profiling method that doesn't require large tensors.

Example Code

# Before
fake_tensor = torch.randn(4096, 4096)  # large tensor

# After
fake_tensor = torch.randn(1024, 1024)  # reduced size tensor

Alternatively, consider using a more memory-efficient profiling method, such as:

# Using a smaller tensor and iterating over it
for i in range(0, 4096, 1024):
    fake_tensor = torch.randn(1024, 1024)
    # profiling code here

Verification

Verify that the OOMing event is resolved by running the profiling test with the reduced tensor size or the new profiling method.

Extra Tips

Monitor memory usage during profiling to ensure the fix is effective.
Consider implementing a dynamic tensor sizing mechanism to adapt to different GPU memory limits.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #output truncation #response parsing #generation error #database connection #vector store

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - ✅(Solved) Fix [CI Failure]: Gemma3 OOMs with transformers backend [1 pull requests, 2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #37717: [ROCm][CI] Add large_gpu_mark to test_max_tokens_none for ROCm

Description (problem / solution / changelog)

Changed files

Test group

Describe the failing test

📝 History of failing test

CC List.

extent analysis

Fix Plan

Steps to Fix

Example Code

Verification

Extra Tips

Still need to ship something?

TRENDING

vllm - ✅(Solved) Fix [CI Failure]: Gemma3 OOMs with transformers backend [1 pull requests, 2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #37717: [ROCm][CI] Add large_gpu_mark to test_max_tokens_none for ROCm

Description (problem / solution / changelog)

Changed files

Test group

Describe the failing test

📝 History of failing test

CC List.

extent analysis

Fix Plan

Steps to Fix

Example Code

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING