vllm - ✅(Solved) Fix [Bug] V1 engine hangs on encoder cache profiling on AMD gfx1151 (MIOpen missing solver DB) [2 pull requests, 2 comments, 3 participants]

vllm2026-03-18 19:28:22

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#37472•Fetched 2026-04-08 00:58:29

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

mentioned ×4subscribed ×4commented ×2cross-referenced ×2

vLLM V1 engine hangs indefinitely during initialization when serving any model with a vision encoder on AMD gfx1151 (Strix Halo / Radeon 8060S). The hang occurs at encoder cache profiling where embed_multimodal() triggers MIOpen convolution operations that never complete.

Error Message

MIOpen(HIP): Error [...] Could not open metadata file: .../gfx1151_ConvHipImplicitGemm3DGroupFwdXdlops_metadata.tn.model

Root Cause

_maybe_initialize_encoder_cache() in gpu_model_runner.py calls self.model.embed_multimodal() with dummy inputs, triggering MIOpen convolution operations. MIOpen has no pre-compiled solver database for gfx1151, causing exhaustive kernel search that either hangs or takes hours.

Env vars MIOPEN_DEBUG_DISABLE_FIND_DB=1, MIOPEN_FIND_ENFORCE=NONE, MIOPEN_DISABLE_CACHE=1 do NOT prevent the hang — the convolution kernel itself blocks.

Fix Action

Workaround

Comment out lines 5509-5525 in gpu_model_runner.py:

sed -i '5509,5525s/^/#/' $(find . -name gpu_model_runner.py -path "*/vllm/v1/worker/*")

This disables vision encoder profiling. Text-only inference works normally afterward.

PR fix notes

PR #38455: [ROCm] Add RDNA 3.5/4 device IDs (gfx1150, gfx1151, gfx1201)

Repository: vllm-project/vllm
Author: dondetir
State: closed | merged: True
Link: https://github.com/vllm-project/vllm/pull/38455

Description (problem / solution / changelog)

Summary

Adds 3 missing entries to _ROCM_DEVICE_ID_NAME_MAP in vllm/platforms/rocm.py:

Device ID	Name	Architecture	Hardware
`0x150e`	`AMD_Radeon_890M`	gfx1150	Strix Point APU
`0x1586`	`AMD_Radeon_8060S`	gfx1151	Strix Halo APU
`0x7550`	`AMD_Radeon_RX9070XT`	gfx1201	Navi 48 discrete

Without these entries, get_device_name() falls back to amdsmi["market_name"] which returns the generic string "AMD Radeon Graphics" for APU devices, causing downstream name-based logic to misbehave.

Related issues: #36615, #37151, #37472, #32180

Device ID sources (high confidence)

0x150e — ROCm issue #5433, lspci reports from Strix Point users
0x1586 — PyTorch forums (Chip ID 5510/0x1586), GPU spec databases, multiple ROCm issue reports
0x7550 — PyTorch forums (Chip ID 0x7550), ROCm TheRock issue #745

Note: The base RX 9070 (non-XT) may share device ID 0x7550 with the XT variant — this is called out as the naming is best-effort until hardware-verified.

Tests added

New file tests/rocm/test_device_id_map.py with 3 offline unit tests:

test_rocm_device_id_map_format — validates all keys are lowercase hex, values non-empty/no-spaces
test_rocm_device_id_map_known_entries — spot-checks new and existing entries
test_rocm_device_id_map_no_duplicate_keys — ensures no duplicate device IDs

Duplicate-work check

gh pr list --repo vllm-project/vllm --state open --search "gfx1151 device id"

No overlapping PRs. PR #37189 adds amdsmi WSL2 fallback (different scope).

Test commands and results

pre-commit run --all-files  # All hooks passed (ruff, mypy, typos, SPDX)

Tests are offline (no GPU required).

AI assistance disclosure

This PR was developed with AI assistance (Claude). All changes were reviewed by a human and independently verified by a separate reviewer with fresh context.

Co-authored-by: Claude

Changed files

vllm/platforms/rocm.py (modified, +6/-1)

PR #38555: [ROCm] Skip encoder cache profiling on consumer RDNA GPUs

Repository: vllm-project/vllm
Author: dondetir
State: open | merged: False
Link: https://github.com/vllm-project/vllm/pull/38555

Description (problem / solution / changelog)

Summary

Skip the encoder cache profiling pass during profile_run() on consumer RDNA 3/3.5 GPUs to prevent an indefinite hang caused by missing MIOpen solver databases.

Problem

When serving multimodal/vision models (e.g., Qwen3-VL) on consumer RDNA GPUs, the V1 engine hangs forever on startup. During profile_run(), _maybe_initialize_encoder_cache() calls embed_multimodal() with dummy inputs, which triggers MIOpen convolution. Datacenter GPUs (MI300/MI350) have pre-compiled solver databases so this is instant. Consumer RDNA GPUs (gfx1100-gfx1103, gfx1150-gfx1151) lack these databases, causing MIOpen to start an exhaustive autotuning search that never completes.

Affects: RX 7900 XTX/XT/GRE (gfx1100), RX 7800 XT (gfx1101), RX 7600 XT/7600 (gfx1102), Radeon 780M iGPU (gfx1103), Radeon 890M (gfx1150), Radeon 8060S/Strix Halo (gfx1151).

Fix

Add on_consumer_rdna() detection helper to vllm/platforms/rocm.py that identifies consumer RDNA arches
In gpu_model_runner.py, guard the encoder profiling block - skip only the encoder warm-up pass on consumer RDNA GPUs
Log a warning explaining the limitation and suggesting alternatives

The rest of profile_run() (text decoder dummy run, sampler/pooler profiling, encoder_cache.clear(), gc.collect()) still executes so memory profiling is not disrupted.

Why this is not duplicating an existing PR

PR #37370 (Add Encoder Dummy Run) is a DRAFT with an empty description, tagged needs-rebase, and addresses a different scope (Model Runner V2 architecture). No functional overlap.
Searched for open PRs with "encoder cache hang", "MIOpen RDNA", "encoder profiling rocm" - none found.

Test commands run

pre-commit run --files vllm/platforms/rocm.py vllm/v1/worker/gpu_model_runner.py
# Result: All 14 hooks passed (ruff, mypy, typos, SPDX, forbidden imports, etc.)

Hardware verification requires a consumer RDNA GPU (gfx1103 available for testing).

AI assistance disclosure

AI assistance was used (Claude) for implementation. All changed lines reviewed by human submitter.

Closes #37472

Changed files

vllm/platforms/rocm.py (modified, +32/-0)
vllm/v1/worker/gpu_model_runner.py (modified, +67/-32)

Code Example

vllm serve Qwen/Qwen3.5-35B-A3B --enforce-eager --dtype float16 --trust-remote-code

---

INFO: Encoder cache will be initialized with a budget of 16384 tokens, and profiled with 1 image items of the maximum feature size.
MIOpen(HIP): Error [...] Could not open metadata file: .../gfx1151_ConvHipImplicitGemm3DGroupFwdXdlops_metadata.tn.model

---

sed -i '5509,5525s/^/#/' $(find . -name gpu_model_runner.py -path "*/vllm/v1/worker/*")

RAW_BUFFERClick to expand / collapse

Description

Environment

GPU: AMD Radeon 8060S (gfx1151, RDNA 3.5 iGPU, 128GB unified LPDDR5X)
vLLM: 0.17.1rc1.dev169 and 0.17.2rc1.dev71
PyTorch: 2.10-2.12 (TheRock nightlies for gfx1151)
ROCm: TheRock 7.11-7.13 nightlies
OS: Fedora 43

Reproduction

vllm serve Qwen/Qwen3.5-35B-A3B --enforce-eager --dtype float16 --trust-remote-code

Server logs show:

INFO: Encoder cache will be initialized with a budget of 16384 tokens, and profiled with 1 image items of the maximum feature size.
MIOpen(HIP): Error [...] Could not open metadata file: .../gfx1151_ConvHipImplicitGemm3DGroupFwdXdlops_metadata.tn.model

Then hangs forever. Health endpoint never returns 200.

Root Cause

Env vars MIOPEN_DEBUG_DISABLE_FIND_DB=1, MIOPEN_FIND_ENFORCE=NONE, MIOPEN_DISABLE_CACHE=1 do NOT prevent the hang — the convolution kernel itself blocks.

Workaround

Comment out lines 5509-5525 in gpu_model_runner.py:

sed -i '5509,5525s/^/#/' $(find . -name gpu_model_runner.py -path "*/vllm/v1/worker/*")

This disables vision encoder profiling. Text-only inference works normally afterward.

Suggested Fix

Add a check in _maybe_initialize_encoder_cache() to skip profiling when:

No multimodal inputs are expected (--limit-mm-per-prompt or text-only serving)
MIOpen solver DB is missing for the current GPU architecture
A new flag like --skip-encoder-profiling is set

Related Issues

#32180 (V1 engine crash on gfx1151)
#37151 (HSA segfault on gfx1151)

Additional Context

This affects ALL Qwen3.5 MoE models (including -Base variants) because they all include a vision encoder. The gfx1151 (Strix Halo) is increasingly popular for local LLM hosting due to its 128GB unified memory.

extent analysis

Fix Plan

To resolve the issue, we need to modify the _maybe_initialize_encoder_cache() function in gpu_model_runner.py to skip profiling under certain conditions. Here are the steps:

Add a new flag --skip-encoder-profiling to the vllm serve command.
Modify the _maybe_initialize_encoder_cache() function to check for the following conditions:
- No multimodal inputs are expected (--limit-mm-per-prompt or text-only serving)
- MIOpen solver DB is missing for the current GPU architecture
- The --skip-encoder-profiling flag is set
If any of these conditions are met, skip the profiling step.

Example code:

def _maybe_initialize_encoder_cache(self):
    # ... existing code ...

    # Check if profiling should be skipped
    if (self.args.limit_mm_per_prompt or self.args.text_only) or \
       not self._has_miopen_solver_db() or \
       self.args.skip_encoder_profiling:
        # Skip profiling
        return

    # ... existing code ...

You can add the --skip-encoder-profiling flag to the vllm serve command like this:

vllm serve Qwen/Qwen3.5-35B-A3B --enforce-eager --dtype float16 --trust-remote-code --skip-encoder-profiling

Verification

To verify that the fix worked, run the vllm serve command with the --skip-encoder-profiling flag and check that the server starts successfully and responds to requests.

Extra Tips

Make sure to update the vllm version to the latest one that includes the fix.
If you are using a custom gpu_model_runner.py file, make sure to apply the changes to that file as well.
You can also consider setting the MIOPEN_DEBUG_DISABLE_FIND_DB and MIOPEN_FIND_ENFORCE environment variables to 1 and NONE respectively to disable the MIOpen solver database search. However, this may not prevent the hang in all cases.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #retriever error #indexing error #inference speed #output truncation #response parsing

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

vllm - ✅(Solved) Fix [Bug] V1 engine hangs on encoder cache profiling on AMD gfx1151 (MIOpen missing solver DB) [2 pull requests, 2 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Workaround

PR fix notes

PR #38455: [ROCm] Add RDNA 3.5/4 device IDs (gfx1150, gfx1151, gfx1201)

Description (problem / solution / changelog)

Summary

Device ID sources (high confidence)

Tests added

Duplicate-work check

Test commands and results

AI assistance disclosure

Changed files

PR #38555: [ROCm] Skip encoder cache profiling on consumer RDNA GPUs

Description (problem / solution / changelog)

Summary

Problem

Fix

Why this is not duplicating an existing PR

Test commands run

AI assistance disclosure

Changed files

Code Example

Description

Environment

Reproduction

Root Cause

Workaround

Suggested Fix

Related Issues

Additional Context

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING