vllm - 💡(How to fix) Fix [CI Failure]: mi355_2: NixlConnector PD + Spec Decode acceptance (2 GPUs) [2 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#41319Fetched 2026-05-01 05:34:17
View on GitHub
Comments
2
Participants
1
Timeline
8
Reactions
0
Participants
Timeline (top)
added_to_project_v2 ×2commented ×2labeled ×1mentioned ×1

Root Cause

  • Flaky test
  • Can reproduce locally
  • Caused by external libraries (e.g. bug in transformers)
RAW_BUFFERClick to expand / collapse

Name of failing test

(command rocm-smi || true) && export VLLM_TEST_GROUP_NAME=mi355_2-nixlconnector-pd---spec-decode-acceptance-2-gpus && export VLLM_ALLOW_DEPRECATED_BEAM_SEARCH=1 && cd /vllm-workspace/tests && uv pip install --system -r /vllm-workspace/requirements/kv_connectors_rocm.txt && ROCM_ATTN=1 bash v1/kv_connector/nixl_integration/spec_decode_acceptance_test.sh

Basic information

  • Flaky test
  • Can reproduce locally
  • Caused by external libraries (e.g. bug in transformers)

🧪 Describe the failing test

AssertionError: All kv cache tensors must have the same number of blocks

📝 History of failing test

  • Current streak start: 2026-04-28
  • First failure in 60d window: 2026-04-21
  • Last successful nightly: 2026-04-27
  • Break frequency (60d, pass↔fail flips): 3
  • Latest nightly date: 2026-04-29
  • Latest build(s): amd-ci #8058
  • Latest hardware status: mi250_2=fail

extent analysis

TL;DR

The most likely fix involves ensuring that all kv cache tensors have the same number of blocks, potentially by updating the spec_decode_acceptance_test.sh script or the code that generates these tensors.

Guidance

  • Verify the test failure by running the spec_decode_acceptance_test.sh script locally and checking the output for any errors or inconsistencies in the kv cache tensors.
  • Investigate the code that generates the kv cache tensors to ensure they are being created with the same number of blocks.
  • Check the requirements/kv_connectors_rocm.txt file to see if any dependencies related to tensor creation or caching need to be updated.
  • Review the test history and hardware status to determine if the issue is specific to certain hardware configurations.

Example

No specific code snippet can be provided without more information about the spec_decode_acceptance_test.sh script or the code that generates the kv cache tensors.

Notes

The fix may depend on the specific implementation of the spec_decode_acceptance_test.sh script and the code that generates the kv cache tensors, which is not provided in the issue.

Recommendation

Apply a workaround by modifying the spec_decode_acceptance_test.sh script to handle kv cache tensors with different numbers of blocks, or update the code that generates these tensors to ensure consistency. This is recommended because the issue seems to be related to a specific test case and hardware configuration, and a workaround may be a quicker solution than a full fix.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING