vllm - 💡(How to fix) Fix [CI] test_no_sync_with_spec_decode[eagle3-llama]: unexpected GPU-CPU sync [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#39537Fetched 2026-04-11 06:12:53
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Author
Participants
Timeline (top)
closed ×1

Error Message

FAILED test_async_spec_decode.py::test_no_sync_with_spec_decode[eagle3-llama] AssertionError: Unexpected GPU-CPU sync: seq_lens_cpu lazy init triggered 2 times. See stack traces above. assert 2 == 0

Root Cause

PR #39206 added assertions to detect GPU-CPU syncs during spec decode. The assertion is correctly catching a real sync caused by seq_lens_cpu lazy initialization. This is the behavior described in #29134 — FlashInfer's plan function triggers a D2H or H2D transfer that blocks full async overlap. The test is surfacing a known limitation that hasn't been resolved yet.

Auto-generated by CI Watch Bot

Code Example

FAILED test_async_spec_decode.py::test_no_sync_with_spec_decode[eagle3-llama]
AssertionError: Unexpected GPU-CPU sync: seq_lens_cpu lazy init triggered 2 times.
See stack traces above.
assert 2 == 0

---

1 failed, 7 passed, 10 skipped, 37 deselected, 1 xfailed, 21 warnings in 471.68s
RAW_BUFFERClick to expand / collapse

Name of failing test

tests/v1/e2e/spec_decode/test_async_spec_decode.py::test_no_sync_with_spec_decode[eagle3-llama]

Basic information

  • Flaky test
  • Can reproduce locally
  • Caused by external libraries (e.g. bug in transformers)

🧪 Describe the failing test

In the 2026-04-09 nightly build (#60697), the Spec Decode Draft Model Nightly B200 step failed with 1 test failure.

Step link: https://buildkite.com/vllm/ci/builds/60697#019d740c-e312-46fd-b21c-4b44b3b5fd1b

Commit: e5de19ff9a64

Error details

FAILED test_async_spec_decode.py::test_no_sync_with_spec_decode[eagle3-llama]
AssertionError: Unexpected GPU-CPU sync: seq_lens_cpu lazy init triggered 2 times.
See stack traces above.
assert 2 == 0

The test asserts that no GPU-CPU synchronization occurs during speculative decoding with async scheduling enabled. The assertion detected 2 unexpected seq_lens_cpu lazy init syncs.

Test results summary

1 failed, 7 passed, 10 skipped, 37 deselected, 1 xfailed, 21 warnings in 471.68s

Potentially causal PRs

  • #39206 — tests/v1/e2e/spec_decode: assert async scheduling is used (merged 2026-04-08) — this PR added the sync assertion that is now failing
  • #38577 — Add nightly b200 test for spec decode eagle correctness (merged 2026-04-09) — added eagle3-llama to nightly B200 tests

Related issues

  • #29134 — [Performance]: Fully Async Spec-Decoding | Make seq_lens_cpu in CommonAttentionMetadata optional — the underlying issue being tracked

Analysis

PR #39206 added assertions to detect GPU-CPU syncs during spec decode. The assertion is correctly catching a real sync caused by seq_lens_cpu lazy initialization. This is the behavior described in #29134 — FlashInfer's plan function triggers a D2H or H2D transfer that blocks full async overlap. The test is surfacing a known limitation that hasn't been resolved yet.

Auto-generated by CI Watch Bot

extent analysis

TL;DR

The most likely fix involves addressing the GPU-CPU synchronization issue caused by seq_lens_cpu lazy initialization, potentially by revisiting the implementation of async scheduling in speculative decoding.

Guidance

  • Review the changes introduced in PR #39206 to understand the added assertion and its implications on the test.
  • Investigate the seq_lens_cpu lazy initialization in the context of speculative decoding and async scheduling to identify potential synchronization points.
  • Consider revisiting the implementation of async scheduling to minimize or eliminate GPU-CPU synchronization, as discussed in issue #29134.
  • Analyze the stack traces provided in the error details to gain a deeper understanding of the synchronization issue.

Example

No specific code snippet can be provided without further context, but reviewing the implementation of seq_lens_cpu lazy initialization and its interaction with async scheduling may help identify the root cause.

Notes

The provided information suggests that the issue is related to a known limitation (issue #29134) that hasn't been resolved yet. The test is correctly detecting a real synchronization issue, and addressing this will require a deeper understanding of the speculative decoding and async scheduling implementation.

Recommendation

Apply workaround: Revisit the implementation of async scheduling in speculative decoding to minimize or eliminate GPU-CPU synchronization, as this is a known limitation that needs to be addressed to resolve the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - 💡(How to fix) Fix [CI] test_no_sync_with_spec_decode[eagle3-llama]: unexpected GPU-CPU sync [1 participants]