vllm - 💡(How to fix) Fix [Bug]: spec decoding nonparallel 路径 draft/target hidden size 不兼容，建议适配不同hidden size [1 participants]

sunchendd · 2026-03-24T06:00:07Z

[vllm] Your current environment The output of python collect_env.py ```text （请补充你的环境信息） ``` ### 🐛 Describe the bug - 使用 Qwen3-0.6B 作为 draft，Qwen3-1.7B或Qwen3-8B 作为 target，nonparallel draft 路径 - 服务能够启动，/health 和 /v1/models 接口正常，但首条 /v1/chat/completions 请求立即崩溃，报错见下： - 崩溃原因位于 vllm_ascend/spec_decode/eagle_proposer.py 的 hidden state 直拷操作：`self.hidden_states[:num_tokens] = target_hidden_states`，报 1024 vs 2048 错误（具体见日志 benchmark_test/qwen17b-qwen06b-nonparallel-serve-dev6.log） **分析与讨论：** - 现实现假设 draft/target hidden size 相等，实际上不同的 Qwen3 规模（如 0.6B/1.7B/8B）对应 hidden size 不一致，因此无法兼容 - 跟踪 vllm/vllm/v1/spec_decode/eagle.py 源码，`self.hidden_states` 缓冲区是按 draft hidden size 分配，而 nonparallel 路径里直接把 target hidden states 拷贝进 draft 缓冲区，导致 shape mismatch - 算法上这两者并不要求 shape 必须一致，只是当前实现偷懒强绑定了维度 - 合理做法应包括：引入 projection 映射、恢复 mtp_proposer 兼容路径，或改为传递其他 draft 可消费的条件表达 **预期行为：** - 支持 draft/target hidden size 不一致的 nonparallel draft 路径，提升大/小模型搭配推理兼容性 **建议修改方向：** - 在 draft hidden size ≠ target hidden size 时，增加桥接层（如 projection），或设计适配接口 - 或提供清晰的报错/文档说明当前实现的 shape 要求，提示用户限制 **相关文件与定位：** - vllm_ascend/spec_decode/eagle_proposer.py - vllm/vllm/v1/spec_decode/eagle.py - 详细日志见 benchmark_test/qwen17b-qwen06b-nonparallel-serve-dev6.log 🐛 该问题导致 spec decoding nonparallel 路径下，不同 hidden size draft/target 组合无法工作，请确认是否有计划支持/优化该场景。 ```python # 涉及代码片段说明： # eagle.py self.hidden_size = self.draft_model_config.get_hidden_size() self.hidden_states = torch.zeros((max_num_tokens, self.hidden_size), ...) # eagle_proposer.py self.hidden_states[:num_tokens] = target_hidden_states # 直接 shape 强拷引发异常 ``` ### Before submitting a new issue... - [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

vllm2026-03-24 06:00:07

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#37966•Fetched 2026-04-08 01:22:25

View on GitHub

Comments

Participants

Timeline

Reactions

Author

sunchendd

Participants

sunchendd

Timeline (top)

labeled ×1subscribed ×1

Code Example

（请补充你的环境信息）

---

# 涉及代码片段说明：
# eagle.py
self.hidden_size = self.draft_model_config.get_hidden_size()
self.hidden_states = torch.zeros((max_num_tokens, self.hidden_size), ...)
# eagle_proposer.py
self.hidden_states[:num_tokens] = target_hidden_states  # 直接 shape 强拷引发异常

RAW_BUFFERClick to expand / collapse

Your current environment

<details> <summary>The output of <code>python collect_env.py</code></summary>

（请补充你的环境信息）

</details>

🐛 Describe the bug

使用 Qwen3-0.6B 作为 draft，Qwen3-1.7B或Qwen3-8B 作为 target，nonparallel draft 路径
服务能够启动，/health 和 /v1/models 接口正常，但首条 /v1/chat/completions 请求立即崩溃，报错见下：
崩溃原因位于 vllm_ascend/spec_decode/eagle_proposer.py 的 hidden state 直拷操作：self.hidden_states[:num_tokens] = target_hidden_states，报 1024 vs 2048 错误（具体见日志 benchmark_test/qwen17b-qwen06b-nonparallel-serve-dev6.log）

分析与讨论：

现实现假设 draft/target hidden size 相等，实际上不同的 Qwen3 规模（如 0.6B/1.7B/8B）对应 hidden size 不一致，因此无法兼容
跟踪 vllm/vllm/v1/spec_decode/eagle.py 源码，self.hidden_states 缓冲区是按 draft hidden size 分配，而 nonparallel 路径里直接把 target hidden states 拷贝进 draft 缓冲区，导致 shape mismatch
算法上这两者并不要求 shape 必须一致，只是当前实现偷懒强绑定了维度
合理做法应包括：引入 projection 映射、恢复 mtp_proposer 兼容路径，或改为传递其他 draft 可消费的条件表达

预期行为：

支持 draft/target hidden size 不一致的 nonparallel draft 路径，提升大/小模型搭配推理兼容性

建议修改方向：

在 draft hidden size ≠ target hidden size 时，增加桥接层（如 projection），或设计适配接口
或提供清晰的报错/文档说明当前实现的 shape 要求，提示用户限制

相关文件与定位：

vllm_ascend/spec_decode/eagle_proposer.py
vllm/vllm/v1/spec_decode/eagle.py
详细日志见 benchmark_test/qwen17b-qwen06b-nonparallel-serve-dev6.log

🐛 该问题导致 spec decoding nonparallel 路径下，不同 hidden size draft/target 组合无法工作，请确认是否有计划支持/优化该场景。

# 涉及代码片段说明：
# eagle.py
self.hidden_size = self.draft_model_config.get_hidden_size()
self.hidden_states = torch.zeros((max_num_tokens, self.hidden_size), ...)
# eagle_proposer.py
self.hidden_states[:num_tokens] = target_hidden_states  # 直接 shape 强拷引发异常

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

To fix the issue of incompatible hidden sizes between draft and target models in nonparallel draft paths, we need to introduce a projection layer to map the target hidden states to the draft hidden size. Here are the steps:

Modify the eagle_proposer.py file to include a projection layer:

import torch import torch.nn as nn

class EagleProposer(nn.Module): def init(self, draft_hidden_size, target_hidden_size): super(EagleProposer, self).init() self.projection = nn.Linear(target_hidden_size, draft_hidden_size)

def forward(self, target_hidden_states):
    projected_hidden_states = self.projection(target_hidden_states)
    return projected_hidden_states

*   Update the `eagle.py` file to use the `EagleProposer` class:
    ```python
from eagle_proposer import EagleProposer

class Eagle(nn.Module):
    def __init__(self, draft_model_config, target_model_config):
        super(Eagle, self).__init__()
        self.draft_hidden_size = draft_model_config.get_hidden_size()
        self.target_hidden_size = target_model_config.get_hidden_size()
        self.eagle_proposer = EagleProposer(self.draft_hidden_size, self.target_hidden_size)
        self.hidden_states = torch.zeros((max_num_tokens, self.draft_hidden_size), ...)

    def forward(self, target_hidden_states):
        projected_hidden_states = self.eagle_proposer(target_hidden_states)
        self.hidden_states[:num_tokens] = projected_hidden_states

Ensure that the draft_model_config and target_model_config objects provide the correct hidden sizes for the draft and target models, respectively.

Verification

To verify that the fix worked, you can test the nonparallel draft path with different hidden size combinations for the draft and target models. Check that the model can run without crashing and that the output is correct.

Extra Tips

Make sure to update the documentation to reflect the changes made to the code.
Consider adding error handling to ensure that the model can handle cases where the hidden sizes are not compatible.
You can also explore other approaches, such as using a more sophisticated projection method or adding additional layers to the model to improve its ability to handle different hidden sizes.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#permission error #memory optimization #batch processing #GPU compatibility #latency issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - 💡(How to fix) Fix [Bug]: spec decoding nonparallel 路径 draft/target hidden size 不兼容，建议适配不同hidden size [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Code Example

Your current environment

🐛 Describe the bug

Before submitting a new issue...

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

TRENDING

vllm - 💡(How to fix) Fix [Bug]: spec decoding nonparallel 路径 draft/target hidden size 不兼容，建议适配不同hidden size [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Code Example

Your current environment

🐛 Describe the bug

Before submitting a new issue...

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING