vllm - 💡(How to fix) Fix [Feature]: ROCm Kimi K2.5 EAGLE3 MTP heads [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#38851Fetched 2026-04-08 02:34:32
View on GitHub
Comments
1
Participants
2
Timeline
19
Reactions
0
Timeline (top)
mentioned ×7subscribed ×7labeled ×2added_to_project_v2 ×1
RAW_BUFFERClick to expand / collapse

🚀 The feature, motivation and pitch

hi @hongxiayang

+viz @powderluv @chunfangamd @andyluo7

spec decode isnt an common method widely used in production but unfortunately the kimi did not release their MTP heads. NVIDIA & production inference API endpoint providers like Baseten have trained their own MTP heads for kimi k2.5

nvidia has open sourced this https://huggingface.co/nvidia/Kimi-K2.5-Thinking-Eagle3

there is also this one trained on torchspec by the community https://huggingface.co/lightseekorg/kimi-k2.5-eagle3 . if this is the recommended mtp head architecture that amd chooses to support, please let me know.

AMD does not have their own eagle3 MTP heads open sourced? when should we expect AMD have production features like MTP for kimi k2.5?

https://huggingface.co/amd/models?search=kimi

<img width="1027" height="413" alt="Image" src="https://github.com/user-attachments/assets/9b4007f5-7b00-4a65-9752-c95fa2d25090" />

Alternatives

not use spec decode and not bad perf per $

Additional context

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

Consider using alternative MTP heads from NVIDIA or the community, such as those available on Hugging Face, as a workaround for the lack of AMD's open-sourced Eagle3 MTP heads for Kimi K2.5.

Guidance

  • Explore the NVIDIA open-sourced Kimi-K2.5-Thinking-Eagle3 model on Hugging Face as a potential alternative.
  • Investigate the community-trained model on torchspec, available at https://huggingface.co/lightseekorg/kimi-k2.5-eagle3, for possible use.
  • Evaluate the performance of these alternative models to determine their suitability for production use.
  • Check the AMD models page on Hugging Face (https://huggingface.co/amd/models?search=kimi) for any updates on available Kimi K2.5 models.

Notes

The availability and suitability of AMD's open-sourced Eagle3 MTP heads for Kimi K2.5 are uncertain, and using alternative models may be necessary.

Recommendation

Apply workaround: Use alternative MTP heads from NVIDIA or the community, as they are currently available and may provide a suitable solution for production use.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING