vllm - 💡(How to fix) Fix fix(dflash): GPUModelRunner hardcodes use_aux_hidden_state_outputs=True for DFlash, ignoring dflash_config.use_aux_hidden_state

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

GPUModelRunner unconditionally sets self.use_aux_hidden_state_outputs = True when initializing a DFlash drafter, ignoring the use_aux_hidden_state field in the draft model's dflash_config.

Root Cause

GPUModelRunner unconditionally sets self.use_aux_hidden_state_outputs = True when initializing a DFlash drafter, ignoring the use_aux_hidden_state field in the draft model's dflash_config.

Code Example

elif self.speculative_config.use_dflash():
    self.drafter = DFlashProposer(self.vllm_config, self.device, self)
    self.use_aux_hidden_state_outputs = True  # <-- hardcoded

---

elif self.speculative_config.use_eagle():
    self.drafter = EagleProposer(...)
    self.use_aux_hidden_state_outputs = self.drafter.eagle3_use_aux_hidden_state

---

elif self.speculative_config.use_dflash():
    self.drafter = DFlashProposer(self.vllm_config, self.device, self)
    self.use_aux_hidden_state_outputs = (
        self.drafter.eagle3_use_aux_hidden_state
    )
RAW_BUFFERClick to expand / collapse

Summary

GPUModelRunner unconditionally sets self.use_aux_hidden_state_outputs = True when initializing a DFlash drafter, ignoring the use_aux_hidden_state field in the draft model's dflash_config.

Code path

In vllm/v1/worker/gpu_model_runner.py:

elif self.speculative_config.use_dflash():
    self.drafter = DFlashProposer(self.vllm_config, self.device, self)
    self.use_aux_hidden_state_outputs = True  # <-- hardcoded

Compare with the EAGLE3 branch, which correctly reads from the proposer:

elif self.speculative_config.use_eagle():
    self.drafter = EagleProposer(...)
    self.use_aux_hidden_state_outputs = self.drafter.eagle3_use_aux_hidden_state

Why this is wrong

DFlashProposer already correctly reads dflash_config.use_aux_hidden_state via _get_eagle3_use_aux_hidden_state_from_config and exposes it as eagle3_use_aux_hidden_state. The draft model (qwen3_dflash.py) also respects this flag to decide whether to use fc and aux hidden states.

However, GPUModelRunner never reads it — it always requests aux hidden states from the target model, regardless of what the draft config says.

Proposed fix

Mirror the EAGLE3 pattern:

elif self.speculative_config.use_dflash():
    self.drafter = DFlashProposer(self.vllm_config, self.device, self)
    self.use_aux_hidden_state_outputs = (
        self.drafter.eagle3_use_aux_hidden_state
    )

Also update the stale comment at the usage site (currently says "True when EAGLE 3 is used").

Verification

Unit tests can mock the drafter or construct a minimal dflash_config — no GPU or large model needed. Tests cover:

  • No dflash_config → defaults to True
  • use_aux_hidden_state key absent → defaults to True
  • Explicit TrueTrue
  • Explicit FalseFalse

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - 💡(How to fix) Fix fix(dflash): GPUModelRunner hardcodes use_aux_hidden_state_outputs=True for DFlash, ignoring dflash_config.use_aux_hidden_state