vllm - 💡(How to fix) Fix [CI Failure]: mi355_2: LM Eval Small Models (2xB200-2xMI355) [2 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#41583Fetched 2026-05-04 04:58:38
View on GitHub
Comments
2
Participants
1
Timeline
8
Reactions
0
Participants
Timeline (top)
added_to_project_v2 ×2commented ×2labeled ×1mentioned ×1

Root Cause

  • Flaky test
  • Can reproduce locally
  • Caused by external libraries (e.g. bug in transformers)

Code Example

FAILED evals/gsm8k/test_gsm8k_correctness.py::test_gsm8k_correctness[DeepSeek-V2-Lite-Instruct-FP8]
RAW_BUFFERClick to expand / collapse

Name of failing test

pytest -s -v evals/gsm8k/test_gsm8k_correctness.py --config-list-file=configs/models-mi3xx-fp8-and-mixed.txt

Basic information

  • Flaky test
  • Can reproduce locally
  • Caused by external libraries (e.g. bug in transformers)

🧪 Describe the failing test

FAILED evals/gsm8k/test_gsm8k_correctness.py::test_gsm8k_correctness[DeepSeek-V2-Lite-Instruct-FP8]

📝 History of failing test

  • Current streak start: 2026-05-02
  • First failure in 60d window: 2026-04-21
  • Last successful nightly: 2026-05-01
  • Break frequency (60d, pass↔fail flips): 2
  • Latest nightly date: 2026-05-03
  • Latest build(s): amd-ci #8177
  • Latest hardware status: mi355_2=fail

extent analysis

TL;DR

Investigate the test_gsm8k_correctness test case, specifically the DeepSeek-V2-Lite-Instruct-FP8 scenario, to identify the cause of the failure.

Guidance

  • Review the test code in evals/gsm8k/test_gsm8k_correctness.py to understand the test scenario and potential failure points.
  • Check the build logs from the latest nightly build (amd-ci #8177) for any relevant error messages or warnings.
  • Verify that the test failure is not related to external libraries (e.g., transformers) by checking for any recent updates or changes to these libraries.
  • Run the test locally with the same configuration to reproduce the issue and gather more information about the failure.

Notes

The issue seems to be related to a specific test case, and the failure is consistent in the latest nightly builds. However, without more information about the test code and the error messages, it's difficult to provide a more specific solution.

Recommendation

Apply workaround: Investigate and debug the test_gsm8k_correctness test case to identify the root cause of the failure, as the issue seems to be specific to this test scenario.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - 💡(How to fix) Fix [CI Failure]: mi355_2: LM Eval Small Models (2xB200-2xMI355) [2 comments, 1 participants]