vllm - ✅(Solved) Fix [Bug]: dbo not support spec decode [1 pull requests, 1 participants]

JasonHe-WQ · 2026-04-24T04:41:07Z

[vllm] PR 40789: Bugfix V1: support tuple model outputs in ubatch wrapper dbo + spec decode - Repository: vllm-project/vllm - Author: he-yufeng - State: open |… # PR #40789: [Bugfix] V1: support tuple model outputs in ubatch wrapper (dbo + spec decode) - Repository: vllm-project/vllm - Author: he-yufeng - State: open | merged: False - Link: https://github.com/vllm-project/vllm/pull/40789 ## Description (problem / solution / changelog) Fixes #40769. ## The bug \`UBatchWrapper._run_ubatches\` and its cudagraph-capture sibling collect per-ubatch model outputs and do \`torch.cat(sorted_results, dim=0)\`. That assumes each entry is a single tensor, which is true for a plain forward — but **not** for target models that collect auxiliary hidden states to feed a drafter. EAGLE3 speculative decoding is the concrete case: the target model's forward returns a tuple \`(hidden_states, aux_hidden_states)\`, so \`torch.cat\` blows up during \`profile_run\` with: \`\`\` TypeError: expected Tensor as element 0 in argument 0, but got tuple \`\`\` This is what makes \`--enable-dbo\` incompatible with any speculative decoding method that requests auxiliary outputs from the target model. ## Fix Extract a tiny \`_cat_ubatch_outputs\` helper that handles both shapes: - **Single-tensor** per ubatch (the current case): \`torch.cat(sorted_results, dim=0)\`. - **Tuple** per ubatch: fan out across components with \`zip(*sorted_results)\`, cat each component, return a tuple in the same order the model produced so downstream code sees the same structure it saw for a single ubatch. Both call sites (cudagraph capture at \`gpu_ubatch_wrapper.py:269\` and the general \`_run_ubatches\` at \`:312\`) switch to the helper. ## Why this is not duplicating an existing PR Per AGENTS.md duplicate-work check: \`\`\` $ gh pr list --state open --search "40769 in:body" [] $ gh pr list --state open --search "ubatch spec decode" [#40750 — TRITON_MLA MTP full CUDA graphs for Kimi on Blackwell — different codepath] \`\`\` ## Test plan I don't have hardware to run a full dbo + EAGLE3 target-model session end-to-end, so: - Verified the helper preserves existing single-tensor behavior by reading the call sites and the contract of the surrounding cudagraph metadata (\`cudagraph_metadata.outputs = result\` — no shape change for non-aux paths). - Confirmed via reading EAGLE3/drafter code that the target model's forward returns \`(hidden, aux_hidden)\` exactly when aux collection is active, which is the case the tuple branch handles. - \`pre-commit run ruff-check --all-files\` clean on the touched file. Would appreciate a reviewer with a dbo + EAGLE3 setup re-running the \`profile_run\` that previously raised. ## AI assistance disclosure This change was authored with AI assistance (Claude). I reviewed every changed line and take responsibility for the fix. ## Changed files - `vllm/v1/worker/gpu_ubatch_wrapper.py` (modified, +21/-2) ## Fixed - Fixed by PR: [Bugfix] V1: support tuple model outputs in ubatch wrapper (dbo + spec decode) (https://github.com/vllm-project/vllm/pull/40789) ### Your current environment https://github.com/vllm-project/vllm/blob/56bdf85e10b807be13225f659f2593051306c77d/vllm/v1/worker/gpu_ubatch_wrapper.py#L260-L276 Latest main ### 🐛 Describe the bug `--enable-dbo` is incompatible with any speculative decoding method which collects auxiliary hidden states from the target model. During profile_run, the ubatch wrapper crashes in `torch.cat` because the model's per-ubatch output is a tuple rather than a single Tensor. ### Root cause In vllm `/v1/worker/gpu_ubatch_wrapper.py`, `_run_ubatches` assumes each ubatch's model_output is a single Tensor and concatenates the two ubatches directly: ``` results: list[tuple[int, torch.Tensor]] = [] ... ... sorted_results = [value for position, value in sorted(results)] result = torch.cat(sorted_results, dim=0) # <-- crashes return result ``` When EAGLE3 is enabled, the target model's forward returns a tuple, because the auxiliary hidden states are required to feed the EAGLE3 drafter. `torch.cat` then fails with: ``` TypeError: expected Tensor as element 0 in argument 0, but got tuple ``` ### Before submitting a new issue... - [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

vllm2026-04-24 04:41:07

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#40769•Fetched 2026-04-24 10:36:20

View on GitHub

Comments

Participants

Timeline

Reactions

Author

JasonHe-WQ

Participants

JasonHe-WQ

Timeline (top)

cross-referenced ×1labeled ×1

Error Message

TypeError: expected Tensor as element 0 in argument 0, but got tuple

Root Cause

In vllm /v1/worker/gpu_ubatch_wrapper.py, _run_ubatches assumes each ubatch's model_output is a single Tensor and concatenates the two ubatches directly:

results: list[tuple[int, torch.Tensor]] = []              
...
...
sorted_results = [value for position, value in sorted(results)]
result = torch.cat(sorted_results, dim=0)   # <-- crashes
return result

When EAGLE3 is enabled, the target model's forward returns a tuple, because the auxiliary hidden states are required to feed the EAGLE3 drafter. torch.cat then fails with:

TypeError: expected Tensor as element 0 in argument 0, but got tuple

Fix Action

Fixed

Fixed by PR: [Bugfix] V1: support tuple model outputs in ubatch wrapper (dbo + spec decode) (https://github.com/vllm-project/vllm/pull/40789)

PR fix notes

PR #40789: [Bugfix] V1: support tuple model outputs in ubatch wrapper (dbo + spec decode)

Repository: vllm-project/vllm
Author: he-yufeng
State: open | merged: False
Link: https://github.com/vllm-project/vllm/pull/40789

Description (problem / solution / changelog)

Fixes #40769.

The bug

`UBatchWrapper._run_ubatches` and its cudagraph-capture sibling collect per-ubatch model outputs and do `torch.cat(sorted_results, dim=0)`. That assumes each entry is a single tensor, which is true for a plain forward — but not for target models that collect auxiliary hidden states to feed a drafter. EAGLE3 speculative decoding is the concrete case: the target model's forward returns a tuple `(hidden_states, aux_hidden_states)`, so `torch.cat` blows up during `profile_run` with:

``` TypeError: expected Tensor as element 0 in argument 0, but got tuple ```

This is what makes `--enable-dbo` incompatible with any speculative decoding method that requests auxiliary outputs from the target model.

Fix

Extract a tiny `_cat_ubatch_outputs` helper that handles both shapes:

Single-tensor per ubatch (the current case): `torch.cat(sorted_results, dim=0)`.
Tuple per ubatch: fan out across components with `zip(*sorted_results)`, cat each component, return a tuple in the same order the model produced so downstream code sees the same structure it saw for a single ubatch.

Both call sites (cudagraph capture at `gpu_ubatch_wrapper.py:269` and the general `_run_ubatches` at `:312`) switch to the helper.

Why this is not duplicating an existing PR

Per AGENTS.md duplicate-work check:

``` $ gh pr list --state open --search "40769 in:body" [] $ gh pr list --state open --search "ubatch spec decode" [#40750 — TRITON_MLA MTP full CUDA graphs for Kimi on Blackwell — different codepath] ```

Test plan

I don't have hardware to run a full dbo + EAGLE3 target-model session end-to-end, so:

Verified the helper preserves existing single-tensor behavior by reading the call sites and the contract of the surrounding cudagraph metadata (`cudagraph_metadata.outputs = result` — no shape change for non-aux paths).
Confirmed via reading EAGLE3/drafter code that the target model's forward returns `(hidden, aux_hidden)` exactly when aux collection is active, which is the case the tuple branch handles.
`pre-commit run ruff-check --all-files` clean on the touched file.

Would appreciate a reviewer with a dbo + EAGLE3 setup re-running the `profile_run` that previously raised.

AI assistance disclosure

This change was authored with AI assistance (Claude). I reviewed every changed line and take responsibility for the fix.

Changed files

vllm/v1/worker/gpu_ubatch_wrapper.py (modified, +21/-2)

Code Example

results: list[tuple[int, torch.Tensor]] = []              
  ...
  ...
  sorted_results = [value for position, value in sorted(results)]
  result = torch.cat(sorted_results, dim=0)   # <-- crashes
  return result

---

TypeError: expected Tensor as element 0 in argument 0, but got tuple

RAW_BUFFERClick to expand / collapse

Your current environment

https://github.com/vllm-project/vllm/blob/56bdf85e10b807be13225f659f2593051306c77d/vllm/v1/worker/gpu_ubatch_wrapper.py#L260-L276

Latest main

🐛 Describe the bug

--enable-dbo is incompatible with any speculative decoding method which collects auxiliary hidden states from the target model.

During profile_run, the ubatch wrapper crashes in torch.cat because the model's per-ubatch output is a tuple rather than a single Tensor.

Root cause

In vllm /v1/worker/gpu_ubatch_wrapper.py, _run_ubatches assumes each ubatch's model_output is a single Tensor and concatenates the two ubatches directly:

results: list[tuple[int, torch.Tensor]] = []              
...
...
sorted_results = [value for position, value in sorted(results)]
result = torch.cat(sorted_results, dim=0)   # <-- crashes
return result

When EAGLE3 is enabled, the target model's forward returns a tuple, because the auxiliary hidden states are required to feed the EAGLE3 drafter. torch.cat then fails with:

TypeError: expected Tensor as element 0 in argument 0, but got tuple

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

Modify the _run_ubatches function in gpu_ubatch_wrapper.py to handle the case where the model's output is a tuple, rather than a single Tensor, when --enable-dbo is used with speculative decoding methods.

Guidance

Identify the line of code where the error occurs (torch.cat(sorted_results, dim=0)) and modify it to handle tuples.
Check the type of value in the sorted_results list to determine if it's a tuple or a single Tensor.
If value is a tuple, extract the relevant Tensor from the tuple before passing it to torch.cat.
Consider adding a check for the --enable-dbo flag to handle the different output types.

Example

if isinstance(value, tuple):
    # Extract the relevant Tensor from the tuple
    tensor = value[0]
    sorted_results.append(tensor)
else:
    sorted_results.append(value)
result = torch.cat(sorted_results, dim=0)

Notes

The exact modification will depend on the structure of the tuple returned by the target model and which Tensor is relevant for the torch.cat operation.

Recommendation

Apply a workaround by modifying the _run_ubatches function to handle the tuple output, as the issue is specific to the interaction between --enable-dbo and speculative decoding methods.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#vector store #embedding generation #cache error #pipeline error #runtime error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - ✅(Solved) Fix [Bug]: dbo not support spec decode [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

PR fix notes

PR #40789: [Bugfix] V1: support tuple model outputs in ubatch wrapper (dbo + spec decode)

Description (problem / solution / changelog)

The bug

Fix

Why this is not duplicating an existing PR

Test plan

AI assistance disclosure

Changed files

Code Example

Your current environment

🐛 Describe the bug

Root cause

Before submitting a new issue...

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

vllm - ✅(Solved) Fix [Bug]: dbo not support spec decode [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

PR fix notes

PR #40789: [Bugfix] V1: support tuple model outputs in ubatch wrapper (dbo + spec decode)

Description (problem / solution / changelog)

The bug

Fix

Why this is not duplicating an existing PR

Test plan

AI assistance disclosure

Changed files

Code Example

Your current environment

🐛 Describe the bug

Root cause

Before submitting a new issue...

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING