vllm - 💡(How to fix) Fix [CI] test_no_sync_with_spec_decode[eagle3-llama]: unexpected GPU-CPU sync [1 participants]

vllm2026-04-10 21:12:17

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#39537•Fetched 2026-04-11 06:12:53

View on GitHub

Comments

Participants

Timeline

Reactions

Author

ZhanqiuHu

Participants

ZhanqiuHu

Timeline (top)

closed ×1

Error Message

FAILED test_async_spec_decode.py::test_no_sync_with_spec_decode[eagle3-llama] AssertionError: Unexpected GPU-CPU sync: seq_lens_cpu lazy init triggered 2 times. See stack traces above. assert 2 == 0

Root Cause

PR #39206 added assertions to detect GPU-CPU syncs during spec decode. The assertion is correctly catching a real sync caused by seq_lens_cpu lazy initialization. This is the behavior described in #29134 — FlashInfer's plan function triggers a D2H or H2D transfer that blocks full async overlap. The test is surfacing a known limitation that hasn't been resolved yet.

Auto-generated by CI Watch Bot

Code Example

FAILED test_async_spec_decode.py::test_no_sync_with_spec_decode[eagle3-llama]
AssertionError: Unexpected GPU-CPU sync: seq_lens_cpu lazy init triggered 2 times.
See stack traces above.
assert 2 == 0

---

1 failed, 7 passed, 10 skipped, 37 deselected, 1 xfailed, 21 warnings in 471.68s

RAW_BUFFERClick to expand / collapse

Name of failing test

tests/v1/e2e/spec_decode/test_async_spec_decode.py::test_no_sync_with_spec_decode[eagle3-llama]

Basic information

Flaky test
Can reproduce locally
Caused by external libraries (e.g. bug in transformers)

🧪 Describe the failing test

In the 2026-04-09 nightly build (#60697), the Spec Decode Draft Model Nightly B200 step failed with 1 test failure.

Step link: https://buildkite.com/vllm/ci/builds/60697#019d740c-e312-46fd-b21c-4b44b3b5fd1b

Commit: e5de19ff9a64

Error details

FAILED test_async_spec_decode.py::test_no_sync_with_spec_decode[eagle3-llama]
AssertionError: Unexpected GPU-CPU sync: seq_lens_cpu lazy init triggered 2 times.
See stack traces above.
assert 2 == 0

The test asserts that no GPU-CPU synchronization occurs during speculative decoding with async scheduling enabled. The assertion detected 2 unexpected seq_lens_cpu lazy init syncs.

Test results summary

1 failed, 7 passed, 10 skipped, 37 deselected, 1 xfailed, 21 warnings in 471.68s

Potentially causal PRs

#39206 — tests/v1/e2e/spec_decode: assert async scheduling is used (merged 2026-04-08) — this PR added the sync assertion that is now failing
#38577 — Add nightly b200 test for spec decode eagle correctness (merged 2026-04-09) — added eagle3-llama to nightly B200 tests

Related issues

#29134 — [Performance]: Fully Async Spec-Decoding | Make seq_lens_cpu in CommonAttentionMetadata optional — the underlying issue being tracked

Analysis

Auto-generated by CI Watch Bot

extent analysis

TL;DR

The most likely fix involves addressing the GPU-CPU synchronization issue caused by seq_lens_cpu lazy initialization, potentially by revisiting the implementation of async scheduling in speculative decoding.

Guidance

Review the changes introduced in PR #39206 to understand the added assertion and its implications on the test.
Investigate the seq_lens_cpu lazy initialization in the context of speculative decoding and async scheduling to identify potential synchronization points.
Consider revisiting the implementation of async scheduling to minimize or eliminate GPU-CPU synchronization, as discussed in issue #29134.
Analyze the stack traces provided in the error details to gain a deeper understanding of the synchronization issue.

Example

No specific code snippet can be provided without further context, but reviewing the implementation of seq_lens_cpu lazy initialization and its interaction with async scheduling may help identify the root cause.

Notes

The provided information suggests that the issue is related to a known limitation (issue #29134) that hasn't been resolved yet. The test is correctly detecting a real synchronization issue, and addressing this will require a deeper understanding of the speculative decoding and async scheduling implementation.

Recommendation

Apply workaround: Revisit the implementation of async scheduling in speculative decoding to minimize or eliminate GPU-CPU synchronization, as this is a known limitation that needs to be addressed to resolve the issue.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - 💡(How to fix) Fix [CI] test_no_sync_with_spec_decode[eagle3-llama]: unexpected GPU-CPU sync [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Code Example

Name of failing test

Basic information

🧪 Describe the failing test

Error details

Test results summary

Potentially causal PRs

Related issues

Analysis

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

vllm - 💡(How to fix) Fix [CI] test_no_sync_with_spec_decode[eagle3-llama]: unexpected GPU-CPU sync [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Code Example

Name of failing test

Basic information

🧪 Describe the failing test

Error details

Test results summary

Potentially causal PRs

Related issues

Analysis

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING