vllm - 💡(How to fix) Fix [CI Failure]: mi355_2: GPQA Eval (GPT-OSS) (2xB200-2xMI355) [4 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#41324Fetched 2026-05-01 05:34:12
View on GitHub
Comments
4
Participants
2
Timeline
20
Reactions
0
Timeline (top)
commented ×4mentioned ×4subscribed ×4added_to_project_v2 ×2

Root Cause

  • Flaky test
  • Can reproduce locally
  • Caused by external libraries (e.g. bug in transformers)

Code Example

FAILED evals/gpt_oss/test_gpqa_correctness.py::test_gpqa_correctness[gpt-oss-20b-rocm-quark-mxfp4-fp8-triton]
RAW_BUFFERClick to expand / collapse

Name of failing test

(command rocm-smi || true) && export VLLM_TEST_GROUP_NAME=mi355_2-gpqa-eval-gpt-oss-2xb200-2xmi355 && export VLLM_ALLOW_DEPRECATED_BEAM_SEARCH=1 && cd /vllm-workspace/tests && uv pip install --system 'gpt-oss[eval]==0.0.5' && pytest -s -v evals/gpt_oss/test_gpqa_correctness.py --config-list-file=configs/models-gfx950.txt

Basic information

  • Flaky test
  • Can reproduce locally
  • Caused by external libraries (e.g. bug in transformers)

🧪 Describe the failing test

FAILED evals/gpt_oss/test_gpqa_correctness.py::test_gpqa_correctness[gpt-oss-20b-rocm-quark-mxfp4-fp8-triton]

📝 History of failing test

  • Current streak start: 2026-04-27
  • First failure in 60d window: 2026-04-21
  • Last successful nightly: 2026-04-26
  • Break frequency (60d, pass↔fail flips): 4
  • Latest nightly date: 2026-04-29
  • Latest build(s): amd-ci #8058
  • Latest hardware status: mi355_2=fail

CC List.

No response

extent analysis

TL;DR

Investigate the test_gpqa_correctness function in evals/gpt_oss/test_gpqa_correctness.py to identify the cause of the failure.

Guidance

  • Review the test case test_gpqa_correctness[gpt-oss-20b-rocm-quark-mxfp4-fp8-triton] to understand the specific conditions that lead to the failure.
  • Check the configuration file configs/models-gfx950.txt for any potential issues or inconsistencies that might be contributing to the test failure.
  • Verify that the gpt-oss library version 0.0.5 is compatible with the current test environment and dependencies.
  • Investigate the hardware status of mi355_2 to determine if there are any issues that could be causing the test to fail.

Example

No code snippet is provided as the issue does not contain sufficient information to create a relevant example.

Notes

The issue seems to be related to a specific test case and hardware configuration, so the solution may depend on the details of the test environment and the gpt-oss library.

Recommendation

Apply workaround: Investigate and fix the specific test case and hardware configuration issues, as the problem seems to be related to a particular combination of factors.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - 💡(How to fix) Fix [CI Failure]: mi355_2: GPQA Eval (GPT-OSS) (2xB200-2xMI355) [4 comments, 2 participants]