vllm - ✅(Solved) Fix [Bug]: dbo not support spec decode [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#40769Fetched 2026-04-24 10:36:20
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
0
Participants
Timeline (top)
cross-referenced ×1labeled ×1

Error Message

TypeError: expected Tensor as element 0 in argument 0, but got tuple

Root Cause

In vllm /v1/worker/gpu_ubatch_wrapper.py, _run_ubatches assumes each ubatch's model_output is a single Tensor and concatenates the two ubatches directly:

results: list[tuple[int, torch.Tensor]] = []              
...
...
sorted_results = [value for position, value in sorted(results)]
result = torch.cat(sorted_results, dim=0)   # <-- crashes
return result

When EAGLE3 is enabled, the target model's forward returns a tuple, because the auxiliary hidden states are required to feed the EAGLE3 drafter. torch.cat then fails with:

TypeError: expected Tensor as element 0 in argument 0, but got tuple

Fix Action

Fixed

PR fix notes

PR #40789: [Bugfix] V1: support tuple model outputs in ubatch wrapper (dbo + spec decode)

Description (problem / solution / changelog)

Fixes #40769.

The bug

`UBatchWrapper._run_ubatches` and its cudagraph-capture sibling collect per-ubatch model outputs and do `torch.cat(sorted_results, dim=0)`. That assumes each entry is a single tensor, which is true for a plain forward — but not for target models that collect auxiliary hidden states to feed a drafter. EAGLE3 speculative decoding is the concrete case: the target model's forward returns a tuple `(hidden_states, aux_hidden_states)`, so `torch.cat` blows up during `profile_run` with:

``` TypeError: expected Tensor as element 0 in argument 0, but got tuple ```

This is what makes `--enable-dbo` incompatible with any speculative decoding method that requests auxiliary outputs from the target model.

Fix

Extract a tiny `_cat_ubatch_outputs` helper that handles both shapes:

  • Single-tensor per ubatch (the current case): `torch.cat(sorted_results, dim=0)`.
  • Tuple per ubatch: fan out across components with `zip(*sorted_results)`, cat each component, return a tuple in the same order the model produced so downstream code sees the same structure it saw for a single ubatch.

Both call sites (cudagraph capture at `gpu_ubatch_wrapper.py:269` and the general `_run_ubatches` at `:312`) switch to the helper.

Why this is not duplicating an existing PR

Per AGENTS.md duplicate-work check:

``` $ gh pr list --state open --search "40769 in:body" [] $ gh pr list --state open --search "ubatch spec decode" [#40750 — TRITON_MLA MTP full CUDA graphs for Kimi on Blackwell — different codepath] ```

Test plan

I don't have hardware to run a full dbo + EAGLE3 target-model session end-to-end, so:

  • Verified the helper preserves existing single-tensor behavior by reading the call sites and the contract of the surrounding cudagraph metadata (`cudagraph_metadata.outputs = result` — no shape change for non-aux paths).
  • Confirmed via reading EAGLE3/drafter code that the target model's forward returns `(hidden, aux_hidden)` exactly when aux collection is active, which is the case the tuple branch handles.
  • `pre-commit run ruff-check --all-files` clean on the touched file.

Would appreciate a reviewer with a dbo + EAGLE3 setup re-running the `profile_run` that previously raised.

AI assistance disclosure

This change was authored with AI assistance (Claude). I reviewed every changed line and take responsibility for the fix.

Changed files

  • vllm/v1/worker/gpu_ubatch_wrapper.py (modified, +21/-2)

Code Example

results: list[tuple[int, torch.Tensor]] = []              
  ...
  ...
  sorted_results = [value for position, value in sorted(results)]
  result = torch.cat(sorted_results, dim=0)   # <-- crashes
  return result

---

TypeError: expected Tensor as element 0 in argument 0, but got tuple
RAW_BUFFERClick to expand / collapse

Your current environment

https://github.com/vllm-project/vllm/blob/56bdf85e10b807be13225f659f2593051306c77d/vllm/v1/worker/gpu_ubatch_wrapper.py#L260-L276

Latest main

🐛 Describe the bug

--enable-dbo is incompatible with any speculative decoding method which collects auxiliary hidden states from the target model.

During profile_run, the ubatch wrapper crashes in torch.cat because the model's per-ubatch output is a tuple rather than a single Tensor.

Root cause

In vllm /v1/worker/gpu_ubatch_wrapper.py, _run_ubatches assumes each ubatch's model_output is a single Tensor and concatenates the two ubatches directly:

results: list[tuple[int, torch.Tensor]] = []              
...
...
sorted_results = [value for position, value in sorted(results)]
result = torch.cat(sorted_results, dim=0)   # <-- crashes
return result

When EAGLE3 is enabled, the target model's forward returns a tuple, because the auxiliary hidden states are required to feed the EAGLE3 drafter. torch.cat then fails with:

TypeError: expected Tensor as element 0 in argument 0, but got tuple

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

Modify the _run_ubatches function in gpu_ubatch_wrapper.py to handle the case where the model's output is a tuple, rather than a single Tensor, when --enable-dbo is used with speculative decoding methods.

Guidance

  • Identify the line of code where the error occurs (torch.cat(sorted_results, dim=0)) and modify it to handle tuples.
  • Check the type of value in the sorted_results list to determine if it's a tuple or a single Tensor.
  • If value is a tuple, extract the relevant Tensor from the tuple before passing it to torch.cat.
  • Consider adding a check for the --enable-dbo flag to handle the different output types.

Example

if isinstance(value, tuple):
    # Extract the relevant Tensor from the tuple
    tensor = value[0]
    sorted_results.append(tensor)
else:
    sorted_results.append(value)
result = torch.cat(sorted_results, dim=0)

Notes

The exact modification will depend on the structure of the tuple returned by the target model and which Tensor is relevant for the torch.cat operation.

Recommendation

Apply a workaround by modifying the _run_ubatches function to handle the tuple output, as the issue is specific to the interaction between --enable-dbo and speculative decoding methods.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - ✅(Solved) Fix [Bug]: dbo not support spec decode [1 pull requests, 1 participants]