vllm - ✅(Solved) Fix [Bug]: GLM-4.7-Flash does not return tool_calls field in vLLM 0.16.0 even with --tool-call-parser glm47 [1 pull requests, 1 comments, 2 participants]

vllm2026-03-12 00:56:59

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#36833•Fetched 2026-04-08 00:34:21

View on GitHub

Comments

Participants

Timeline

Reactions

Author

CodeofGame

Participants

CodeofGame

flutist

Timeline (top)

commented ×1labeled ×1

PR fix notes

PR #37385: [Bugfix] Fix GLM-4 tool parser double serialization in streaming

Repository: vllm-project/vllm
Author: karanb192
State: open | merged: False
Link: https://github.com/vllm-project/vllm/pull/37385

Description (problem / solution / changelog)

Summary

Fix double serialization bug in GLM-4 tool parser streaming where prev_tool_call_arr stored arguments as a parsed dict instead of a JSON string, causing malformed final delta chunks in streaming tool call responses.

Root Cause

In glm4_moe_tool_parser.py, the streaming finalization code stored arguments as a dict:

args_dict = json.loads(full_args_str)
self.prev_tool_call_arr[self.current_tool_id] = {
    "arguments": args_dict,  # stored as dict -- BUG
}

The serving layer (chat_completion/serving.py) then called json.dumps(args) on this dict, which could produce different JSON formatting than streamed_args_for_tool, causing the replace() logic to fail and emit a spurious cumulative final delta.

Fix

Store arguments as the raw JSON string (matching other tool parsers like Qwen3Coder, DeepSeek-V3.2, Seed-OSS):

json.loads(full_args_str)  # validate only
self.prev_tool_call_arr[self.current_tool_id] = {
    "arguments": full_args_str,  # stored as string
}

The serving layer already handles both str and dict types at lines 1143-1146:

if isinstance(args, str):
    expected_call = args  # use directly, no double serialization
else:
    expected_call = json.dumps(args, ensure_ascii=False)

Test Plan

pytest tests/tool_parsers/test_glm4_moe_tool_parser.py -v
pytest tests/tool_parsers/test_glm47_moe_tool_parser.py -v

Fixes #36833

Changed files

tests/tool_parsers/test_glm47_moe_tool_parser.py (modified, +4/-1)
tests/tool_parsers/test_glm4_moe_tool_parser.py (modified, +17/-11)
vllm/tool_parsers/glm4_moe_tool_parser.py (modified, +3/-3)

Code Example

{
  "model": "GLM-4.7-Flash",
  "messages": [
    {"role": "user", "content": "What is the weather in Beijing today?"}
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get the current weather",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {"type": "string"}
          },
          "required": ["location"]
        }
      }
    }
  ]
}

RAW_BUFFERClick to expand / collapse

Your current environment

Hi,

I am trying to deploy GLM-4.7-Flash using vLLM 0.16.0 and enable tool calling.
However, the response does not include the tool_calls field even though the tool parser is configured.

Environment:

vLLM version: 0.16.0
Model: GLM-4.7-Flash
GPU: NVIDIA RTX A6000
CUDA version: 12.4

Startup command:

vllm serve /path/to/GLM-4.7-Flash
--port 8000
--tool-call-parser glm47
--enable-auto-tool-choice

Request example:

POST /v1/chat/completions

{
  "model": "GLM-4.7-Flash",
  "messages": [
    {"role": "user", "content": "What is the weather in Beijing today?"}
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get the current weather",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {"type": "string"}
          },
          "required": ["location"]
        }
      }
    }
  ]
}

Expected behavior: The response should include a tool_calls field indicating that the model wants to call the function.

Actual behavior: The response only returns normal text and does not contain the tool_calls field.

Is there any additional configuration required for GLM-4.7-Flash to enable tool calling in vLLM?

Thanks!

extent analysis

Fix Plan

To enable tool calling for GLM-4.7-Flash in vLLM 0.16.0, you need to modify the model configuration and ensure the tool parser is correctly set up.

Update the vllm serve command to include the --tool-calling flag:

vllm serve /path/to/GLM-4.7-Flash \
  --port 8000 \
  --tool-call-parser glm47 \
  --enable-auto-tool-choice \
  --tool-calling

Verify that the glm47 tool parser is correctly configured in the vllm configuration file. You can do this by checking the tool_call_parsers section in the configuration file:

tool_call_parsers:
  glm47:
    type: glm47
    params:
      # parser-specific parameters

Ensure that the get_weather function is correctly defined in the tools section of the request:

{
  "model": "GLM-4.7-Flash",
  "messages": [
    {"role": "user", "content": "What is the weather in Beijing today?"}
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get the current weather",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {"type": "string"}
          },
          "required": ["location"]
        }
      }
    }
  ]
}

Verification

After applying the fix, send the same request to the v1/chat/completions endpoint and verify that the response includes the tool_calls field:

{
  "text": "...",
  "tool_calls": [
    {
      "function_name": "get_weather",
      "args": ["Beijing"]
    }
  ]
}

Extra Tips

Make sure to check the vLLM documentation for any specific requirements or limitations for tool calling with GLM-4.7-Flash.
If you continue to encounter issues, try enabling debug logging for vLLM to get more detailed error messages.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #database connection #vector store #embedding generation #cache error #pipeline error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - ✅(Solved) Fix [Bug]: GLM-4.7-Flash does not return tool_calls field in vLLM 0.16.0 even with --tool-call-parser glm47 [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

PR fix notes

PR #37385: [Bugfix] Fix GLM-4 tool parser double serialization in streaming

Description (problem / solution / changelog)

Summary

Root Cause

Fix

Test Plan

Changed files

Code Example

Your current environment

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

TRENDING

vllm - ✅(Solved) Fix [Bug]: GLM-4.7-Flash does not return tool_calls field in vLLM 0.16.0 even with --tool-call-parser glm47 [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

PR fix notes

PR #37385: [Bugfix] Fix GLM-4 tool parser double serialization in streaming

Description (problem / solution / changelog)

Summary

Root Cause

Fix

Test Plan

Changed files

Code Example

Your current environment

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING