vllm - 💡(How to fix) Fix [Bug]: spec decoding nonparallel 路径 draft/target hidden size 不兼容,建议适配不同hidden size [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#37966Fetched 2026-04-08 01:22:25
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
0
Author
Participants
Timeline (top)
labeled ×1subscribed ×1

Code Example

(请补充你的环境信息)

---

# 涉及代码片段说明:
# eagle.py
self.hidden_size = self.draft_model_config.get_hidden_size()
self.hidden_states = torch.zeros((max_num_tokens, self.hidden_size), ...)
# eagle_proposer.py
self.hidden_states[:num_tokens] = target_hidden_states  # 直接 shape 强拷引发异常
RAW_BUFFERClick to expand / collapse

Your current environment

<details> <summary>The output of <code>python collect_env.py</code></summary>
(请补充你的环境信息)
</details>

🐛 Describe the bug

  • 使用 Qwen3-0.6B 作为 draft,Qwen3-1.7B或Qwen3-8B 作为 target,nonparallel draft 路径
  • 服务能够启动,/health 和 /v1/models 接口正常,但首条 /v1/chat/completions 请求立即崩溃,报错见下:
  • 崩溃原因位于 vllm_ascend/spec_decode/eagle_proposer.py 的 hidden state 直拷操作:self.hidden_states[:num_tokens] = target_hidden_states,报 1024 vs 2048 错误(具体见日志 benchmark_test/qwen17b-qwen06b-nonparallel-serve-dev6.log)

分析与讨论:

  • 现实现假设 draft/target hidden size 相等,实际上不同的 Qwen3 规模(如 0.6B/1.7B/8B)对应 hidden size 不一致,因此无法兼容
  • 跟踪 vllm/vllm/v1/spec_decode/eagle.py 源码,self.hidden_states 缓冲区是按 draft hidden size 分配,而 nonparallel 路径里直接把 target hidden states 拷贝进 draft 缓冲区,导致 shape mismatch
  • 算法上这两者并不要求 shape 必须一致,只是当前实现偷懒强绑定了维度
  • 合理做法应包括:引入 projection 映射、恢复 mtp_proposer 兼容路径,或改为传递其他 draft 可消费的条件表达

预期行为:

  • 支持 draft/target hidden size 不一致的 nonparallel draft 路径,提升大/小模型搭配推理兼容性

建议修改方向:

  • 在 draft hidden size ≠ target hidden size 时,增加桥接层(如 projection),或设计适配接口
  • 或提供清晰的报错/文档说明当前实现的 shape 要求,提示用户限制

相关文件与定位:

  • vllm_ascend/spec_decode/eagle_proposer.py
  • vllm/vllm/v1/spec_decode/eagle.py
  • 详细日志见 benchmark_test/qwen17b-qwen06b-nonparallel-serve-dev6.log

🐛 该问题导致 spec decoding nonparallel 路径下,不同 hidden size draft/target 组合无法工作,请确认是否有计划支持/优化该场景。

# 涉及代码片段说明:
# eagle.py
self.hidden_size = self.draft_model_config.get_hidden_size()
self.hidden_states = torch.zeros((max_num_tokens, self.hidden_size), ...)
# eagle_proposer.py
self.hidden_states[:num_tokens] = target_hidden_states  # 直接 shape 强拷引发异常

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

To fix the issue of incompatible hidden sizes between draft and target models in nonparallel draft paths, we need to introduce a projection layer to map the target hidden states to the draft hidden size. Here are the steps:

  • Modify the eagle_proposer.py file to include a projection layer:

import torch import torch.nn as nn

class EagleProposer(nn.Module): def init(self, draft_hidden_size, target_hidden_size): super(EagleProposer, self).init() self.projection = nn.Linear(target_hidden_size, draft_hidden_size)

def forward(self, target_hidden_states):
    projected_hidden_states = self.projection(target_hidden_states)
    return projected_hidden_states
*   Update the `eagle.py` file to use the `EagleProposer` class:
    ```python
from eagle_proposer import EagleProposer

class Eagle(nn.Module):
    def __init__(self, draft_model_config, target_model_config):
        super(Eagle, self).__init__()
        self.draft_hidden_size = draft_model_config.get_hidden_size()
        self.target_hidden_size = target_model_config.get_hidden_size()
        self.eagle_proposer = EagleProposer(self.draft_hidden_size, self.target_hidden_size)
        self.hidden_states = torch.zeros((max_num_tokens, self.draft_hidden_size), ...)

    def forward(self, target_hidden_states):
        projected_hidden_states = self.eagle_proposer(target_hidden_states)
        self.hidden_states[:num_tokens] = projected_hidden_states
  • Ensure that the draft_model_config and target_model_config objects provide the correct hidden sizes for the draft and target models, respectively.

Verification

To verify that the fix worked, you can test the nonparallel draft path with different hidden size combinations for the draft and target models. Check that the model can run without crashing and that the output is correct.

Extra Tips

  • Make sure to update the documentation to reflect the changes made to the code.
  • Consider adding error handling to ensure that the model can handle cases where the hidden sizes are not compatible.
  • You can also explore other approaches, such as using a more sophisticated projection method or adding additional layers to the model to improve its ability to handle different hidden sizes.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING