vllm - 💡(How to fix) Fix [Bug]: 0.17.0rc1在A2部署GLM-4.7,开启MTP后工具调用异常 [1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#37846Fetched 2026-04-08 01:17:43
View on GitHub
Comments
1
Participants
1
Timeline
3
Reactions
0
Participants
Timeline (top)
commented ×1cross-referenced ×1labeled ×1
RAW_BUFFERClick to expand / collapse

Your current environment

0.17.0rc1,A2 910B1到B3,GLM-4.7,HDK 25.2.3

🐛 Describe the bug

问题1: · 关闭 MTP 时,工具调用正常工作;开启 MTP3 时,工具调用必定出现 JSON 格式错误;错误信息:JSON 解析失败,提示缺少 } · 参考官方Issue尝试修复,经验证有效:GitHub Issue #34449: [Bug]: GLM-5-FP8 malformed tool calls · 原因分析:当 MTP (Multi-Token Prediction) 开启时,vLLM 会并行预测多个 token。但 GLM 系列的 tool parser 使用 partial_json_parser 进行 autocomplete(自动补全不完整的 JSON),这导致:autocomplete 结果与实际输出不匹配 - MTP 并行生成时,token 边界可能错乱;计算 remaining_call 时出错 - 用 autocomplete 后的完整 JSON 减去已发送的内容,结果可能是重复的、截断的或畸形的 JSON;最终发送到客户端的 JSON 是错误的 - 客户端解析失败

问题2: · 修复问题1后,发现工具调用仍出现概率性错误,错误表现仍为JSON 解析失败,提示缺少 } · 根据用户反馈(用户规模1000+,周一早上3小时内就有约10个用户反馈问题,比较严重了),开启MTP前基本没有出现此问题,开启后此问题才开始出现 · 是概率性错误,可以通过重试来绕过,但是非常影响使用体验 · 当前使用的是GLM-4.7基于最新msmodelslim官方工具W8A8量化模型,手动拼接Float MTP权重;部署方案是mooncake V1 PD分离 · 原因分析:感觉更像是模型精度问题?

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

To address the JSON parsing errors when MTP is enabled, we need to modify the tool parser to handle the parallel prediction output correctly.

Step-by-Step Solution:

  1. Update the partial_json_parser: Modify the parser to account for the parallel token generation when MTP is enabled. This can be achieved by buffering the output and reassembling the JSON objects.
  2. Implement a retry mechanism: For the probabilistic errors, implement a retry mechanism with a limited number of attempts to handle transient errors.
  3. Model precision adjustment: Consider adjusting the model precision or exploring alternative models to reduce the occurrence of probabilistic errors.

Example Code Snippet (Python):

import json

class MTPJsonParser:
    def __init__(self):
        self.buffer = []

    def parse(self, output):
        self.buffer.append(output)
        try:
            # Attempt to parse the buffered output as JSON
            json_output = json.loads(''.join(self.buffer))
            self.buffer = []
            return json_output
        except json.JSONDecodeError:
            # If parsing fails, continue buffering
            return None

    def retry_parse(self, output, max_retries=3):
        for _ in range(max_retries):
            parsed_output = self.parse(output)
            if parsed_output is not None:
                return parsed_output
        # If all retries fail, raise an error
        raise ValueError("Failed to parse JSON output after retries")

Verification

To verify the fix, test the tool calls with MTP enabled and disabled, ensuring that the JSON output is correctly parsed in both cases. Monitor the error rates and user feedback to confirm that the probabilistic errors are significantly reduced.

Extra Tips

  • Regularly review and update the model to ensure the best possible precision and reduce errors.
  • Consider implementing additional logging and monitoring to quickly identify and address any recurring issues.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - 💡(How to fix) Fix [Bug]: 0.17.0rc1在A2部署GLM-4.7,开启MTP后工具调用异常 [1 comments, 1 participants]