vllm - 💡(How to fix) Fix [CI Failure]: mi355_2: GPQA Eval (GPT-OSS) (2xB200-2xMI355) [4 comments, 2 participants]

AndreasKaratzas · 2026-04-30T02:38:53Z

[vllm] Name of failing test command rocm-smi || true && export VLLM TEST GROUP NAME=mi355 2-gpqa-eval-gpt-oss-2xb200-2xmi355 && export VLLM ALLOW DEPRECATED BE… ### Name of failing test `(command rocm-smi || true) && export VLLM_TEST_GROUP_NAME=mi355_2-gpqa-eval-gpt-oss-2xb200-2xmi355 && export VLLM_ALLOW_DEPRECATED_BEAM_SEARCH=1 && cd /vllm-workspace/tests && uv pip install --system 'gpt-oss[eval]==0.0.5' && pytest -s -v evals/gpt_oss/test_gpqa_correctness.py --config-list-file=configs/models-gfx950.txt` ### Basic information - [ ] Flaky test - [x] Can reproduce locally - [ ] Caused by external libraries (e.g. bug in `transformers`) ### 🧪 Describe the failing test ``` FAILED evals/gpt_oss/test_gpqa_correctness.py::test_gpqa_correctness[gpt-oss-20b-rocm-quark-mxfp4-fp8-triton] ``` ### 📝 History of failing test - Current streak start: 2026-04-27 - First failure in 60d window: 2026-04-21 - Last successful nightly: 2026-04-26 - Break frequency (60d, pass↔fail flips): 4 - Latest nightly date: 2026-04-29 - Latest build(s): [amd-ci #8058](https://buildkite.com/vllm/amd-ci/builds/8058) - Latest hardware status: `mi355_2`=fail ### CC List. _No response_

vllm2026-04-30 02:38:53

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#41324•Fetched 2026-05-01 05:34:12

View on GitHub

Comments

Participants

Timeline

Reactions

Author

AndreasKaratzas

Participants

AndreasKaratzas

github-actions[bot]

Timeline (top)

commented ×4mentioned ×4subscribed ×4added_to_project_v2 ×2

Root Cause

Flaky test
Can reproduce locally
Caused by external libraries (e.g. bug in transformers)

Code Example

FAILED evals/gpt_oss/test_gpqa_correctness.py::test_gpqa_correctness[gpt-oss-20b-rocm-quark-mxfp4-fp8-triton]

RAW_BUFFERClick to expand / collapse

Name of failing test

(command rocm-smi || true) && export VLLM_TEST_GROUP_NAME=mi355_2-gpqa-eval-gpt-oss-2xb200-2xmi355 && export VLLM_ALLOW_DEPRECATED_BEAM_SEARCH=1 && cd /vllm-workspace/tests && uv pip install --system 'gpt-oss[eval]==0.0.5' && pytest -s -v evals/gpt_oss/test_gpqa_correctness.py --config-list-file=configs/models-gfx950.txt

Basic information

Flaky test
Can reproduce locally
Caused by external libraries (e.g. bug in transformers)

🧪 Describe the failing test

FAILED evals/gpt_oss/test_gpqa_correctness.py::test_gpqa_correctness[gpt-oss-20b-rocm-quark-mxfp4-fp8-triton]

📝 History of failing test

Current streak start: 2026-04-27
First failure in 60d window: 2026-04-21
Last successful nightly: 2026-04-26
Break frequency (60d, pass↔fail flips): 4
Latest nightly date: 2026-04-29
Latest build(s): amd-ci #8058
Latest hardware status: mi355_2=fail

CC List.

No response

extent analysis

TL;DR

Investigate the test_gpqa_correctness function in evals/gpt_oss/test_gpqa_correctness.py to identify the cause of the failure.

Guidance

Review the test case test_gpqa_correctness[gpt-oss-20b-rocm-quark-mxfp4-fp8-triton] to understand the specific conditions that lead to the failure.
Check the configuration file configs/models-gfx950.txt for any potential issues or inconsistencies that might be contributing to the test failure.
Verify that the gpt-oss library version 0.0.5 is compatible with the current test environment and dependencies.
Investigate the hardware status of mi355_2 to determine if there are any issues that could be causing the test to fail.

Example

No code snippet is provided as the issue does not contain sufficient information to create a relevant example.

Notes

The issue seems to be related to a specific test case and hardware configuration, so the solution may depend on the details of the test environment and the gpt-oss library.

Recommendation

Apply workaround: Investigate and fix the specific test case and hardware configuration issues, as the problem seems to be related to a particular combination of factors.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#tool integration #LLM response #prompt template #agent execution #callback error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - 💡(How to fix) Fix [CI Failure]: mi355_2: GPQA Eval (GPT-OSS) (2xB200-2xMI355) [4 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Name of failing test

Basic information

🧪 Describe the failing test

📝 History of failing test

CC List.

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

vllm - 💡(How to fix) Fix [CI Failure]: mi355_2: GPQA Eval (GPT-OSS) (2xB200-2xMI355) [4 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Name of failing test

Basic information

🧪 Describe the failing test

📝 History of failing test

CC List.

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING