vllm - ✅(Solved) Fix [Bug]: GLM-4.7-Flash does not return tool_calls field in vLLM 0.16.0 even with --tool-call-parser glm47 [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#36833Fetched 2026-04-08 00:34:21
View on GitHub
Comments
1
Participants
2
Timeline
2
Reactions
1
Participants
Timeline (top)
commented ×1labeled ×1

PR fix notes

PR #37385: [Bugfix] Fix GLM-4 tool parser double serialization in streaming

Description (problem / solution / changelog)

Summary

Fix double serialization bug in GLM-4 tool parser streaming where prev_tool_call_arr stored arguments as a parsed dict instead of a JSON string, causing malformed final delta chunks in streaming tool call responses.

Root Cause

In glm4_moe_tool_parser.py, the streaming finalization code stored arguments as a dict:

args_dict = json.loads(full_args_str)
self.prev_tool_call_arr[self.current_tool_id] = {
    "arguments": args_dict,  # stored as dict -- BUG
}

The serving layer (chat_completion/serving.py) then called json.dumps(args) on this dict, which could produce different JSON formatting than streamed_args_for_tool, causing the replace() logic to fail and emit a spurious cumulative final delta.

Fix

Store arguments as the raw JSON string (matching other tool parsers like Qwen3Coder, DeepSeek-V3.2, Seed-OSS):

json.loads(full_args_str)  # validate only
self.prev_tool_call_arr[self.current_tool_id] = {
    "arguments": full_args_str,  # stored as string
}

The serving layer already handles both str and dict types at lines 1143-1146:

if isinstance(args, str):
    expected_call = args  # use directly, no double serialization
else:
    expected_call = json.dumps(args, ensure_ascii=False)

Test Plan

pytest tests/tool_parsers/test_glm4_moe_tool_parser.py -v
pytest tests/tool_parsers/test_glm47_moe_tool_parser.py -v

Fixes #36833

Changed files

  • tests/tool_parsers/test_glm47_moe_tool_parser.py (modified, +4/-1)
  • tests/tool_parsers/test_glm4_moe_tool_parser.py (modified, +17/-11)
  • vllm/tool_parsers/glm4_moe_tool_parser.py (modified, +3/-3)

Code Example

{
  "model": "GLM-4.7-Flash",
  "messages": [
    {"role": "user", "content": "What is the weather in Beijing today?"}
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get the current weather",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {"type": "string"}
          },
          "required": ["location"]
        }
      }
    }
  ]
}
RAW_BUFFERClick to expand / collapse

Your current environment

Hi,

I am trying to deploy GLM-4.7-Flash using vLLM 0.16.0 and enable tool calling.
However, the response does not include the tool_calls field even though the tool parser is configured.

Environment:

  • vLLM version: 0.16.0
  • Model: GLM-4.7-Flash
  • GPU: NVIDIA RTX A6000
  • CUDA version: 12.4

Startup command:

vllm serve /path/to/GLM-4.7-Flash
--port 8000
--tool-call-parser glm47
--enable-auto-tool-choice

Request example:

POST /v1/chat/completions

{
  "model": "GLM-4.7-Flash",
  "messages": [
    {"role": "user", "content": "What is the weather in Beijing today?"}
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get the current weather",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {"type": "string"}
          },
          "required": ["location"]
        }
      }
    }
  ]
}

Expected behavior: The response should include a tool_calls field indicating that the model wants to call the function.

Actual behavior: The response only returns normal text and does not contain the tool_calls field.

Is there any additional configuration required for GLM-4.7-Flash to enable tool calling in vLLM?

Thanks!

extent analysis

Fix Plan

To enable tool calling for GLM-4.7-Flash in vLLM 0.16.0, you need to modify the model configuration and ensure the tool parser is correctly set up.

  • Update the vllm serve command to include the --tool-calling flag:
vllm serve /path/to/GLM-4.7-Flash \
  --port 8000 \
  --tool-call-parser glm47 \
  --enable-auto-tool-choice \
  --tool-calling
  • Verify that the glm47 tool parser is correctly configured in the vllm configuration file. You can do this by checking the tool_call_parsers section in the configuration file:
tool_call_parsers:
  glm47:
    type: glm47
    params:
      # parser-specific parameters
  • Ensure that the get_weather function is correctly defined in the tools section of the request:
{
  "model": "GLM-4.7-Flash",
  "messages": [
    {"role": "user", "content": "What is the weather in Beijing today?"}
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get the current weather",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {"type": "string"}
          },
          "required": ["location"]
        }
      }
    }
  ]
}

Verification

After applying the fix, send the same request to the v1/chat/completions endpoint and verify that the response includes the tool_calls field:

{
  "text": "...",
  "tool_calls": [
    {
      "function_name": "get_weather",
      "args": ["Beijing"]
    }
  ]
}

Extra Tips

  • Make sure to check the vLLM documentation for any specific requirements or limitations for tool calling with GLM-4.7-Flash.
  • If you continue to encounter issues, try enabling debug logging for vLLM to get more detailed error messages.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - ✅(Solved) Fix [Bug]: GLM-4.7-Flash does not return tool_calls field in vLLM 0.16.0 even with --tool-call-parser glm47 [1 pull requests, 1 comments, 2 participants]