vllm - 💡(How to fix) Fix fix(dflash): GPUModelRunner hardcodes use_aux_hidden_state_outputs=True for DFlash, ignoring dflash_config.use_aux_hidden_state

vllm2026-05-29 13:52:28

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GPUModelRunner unconditionally sets self.use_aux_hidden_state_outputs = True when initializing a DFlash drafter, ignoring the use_aux_hidden_state field in the draft model's dflash_config.

Root Cause

GPUModelRunner unconditionally sets self.use_aux_hidden_state_outputs = True when initializing a DFlash drafter, ignoring the use_aux_hidden_state field in the draft model's dflash_config.

Code Example

elif self.speculative_config.use_dflash():
    self.drafter = DFlashProposer(self.vllm_config, self.device, self)
    self.use_aux_hidden_state_outputs = True  # <-- hardcoded

---

elif self.speculative_config.use_eagle():
    self.drafter = EagleProposer(...)
    self.use_aux_hidden_state_outputs = self.drafter.eagle3_use_aux_hidden_state

---

elif self.speculative_config.use_dflash():
    self.drafter = DFlashProposer(self.vllm_config, self.device, self)
    self.use_aux_hidden_state_outputs = (
        self.drafter.eagle3_use_aux_hidden_state
    )

RAW_BUFFERClick to expand / collapse

Summary

GPUModelRunner unconditionally sets self.use_aux_hidden_state_outputs = True when initializing a DFlash drafter, ignoring the use_aux_hidden_state field in the draft model's dflash_config.

Code path

In vllm/v1/worker/gpu_model_runner.py:

elif self.speculative_config.use_dflash():
    self.drafter = DFlashProposer(self.vllm_config, self.device, self)
    self.use_aux_hidden_state_outputs = True  # <-- hardcoded

Compare with the EAGLE3 branch, which correctly reads from the proposer:

elif self.speculative_config.use_eagle():
    self.drafter = EagleProposer(...)
    self.use_aux_hidden_state_outputs = self.drafter.eagle3_use_aux_hidden_state

Why this is wrong

DFlashProposer already correctly reads dflash_config.use_aux_hidden_state via _get_eagle3_use_aux_hidden_state_from_config and exposes it as eagle3_use_aux_hidden_state. The draft model (qwen3_dflash.py) also respects this flag to decide whether to use fc and aux hidden states.

However, GPUModelRunner never reads it — it always requests aux hidden states from the target model, regardless of what the draft config says.

Proposed fix

Mirror the EAGLE3 pattern:

elif self.speculative_config.use_dflash():
    self.drafter = DFlashProposer(self.vllm_config, self.device, self)
    self.use_aux_hidden_state_outputs = (
        self.drafter.eagle3_use_aux_hidden_state
    )

Also update the stale comment at the usage site (currently says "True when EAGLE 3 is used").

Verification

Unit tests can mock the drafter or construct a minimal dflash_config — no GPU or large model needed. Tests cover:

No dflash_config → defaults to True
use_aux_hidden_state key absent → defaults to True
Explicit True → True
Explicit False → False

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - 💡(How to fix) Fix fix(dflash): GPUModelRunner hardcodes use_aux_hidden_state_outputs=True for DFlash, ignoring dflash_config.use_aux_hidden_state

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Summary

Code path

Why this is wrong

Proposed fix

Verification

Still need to ship something?

TRENDING