vllm - ✅(Solved) Fix [Bug] GLM-5.1-FP8: tool result content replaced with `<tools>\n</tools>` in /v1/chat/completions when --chat-template-content-format is auto [2 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#39614Fetched 2026-04-12 13:24:21
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Author
Participants

When using /v1/chat/completions with --tool-call-parser glm47 and --chat-template-content-format auto (the default), GLM-5.1-FP8 completely ignores tool results in multi-turn conversations. The model always responds as if the tool returned no data.

Using vLLM's /tokenize + /detokenize endpoints, we confirmed that the actual prompt sent to the model contains:

<|observation|><tool_response><tools>\n</tools></tool_response>

The tool result content ("15°C, partly cloudy") is entirely absent. In its place is the literal string <tools>\n</tools> — the empty tools XML wrapper from the system prompt section.

Root Cause

In vllm/entrypoints/chat_utils.py, _parse_chat_message_content():

content = message.get("content")  # "15°C, partly cloudy" (string)

# All string content is converted to content-part list format:
elif isinstance(content, str):
    content = [ChatCompletionContentPartTextParam(type="text", text=content)]
    # → [{"type": "text", "text": "15°C, partly cloudy"}]

result = _parse_chat_message_content_parts(
    role,           # "tool"
    content,        # [{"type": "text", "text": "15°C, partly cloudy"}]
    mm_tracker,
    wrap_dicts=(content_format == "openai"),  # ← True when format="openai"
    ...
)

The "auto" content format resolves to "openai" for GLM-4-MoE models (likely because glm4_moe is registered as a multimodal architecture). This causes wrap_dicts=True, and the ConversationMessage.content for tool role messages becomes a list [{"type": "text", "text": "15°C, partly cloudy"}] instead of a plain string.

The GLM-5.1-FP8 chat template (chat_template.jinja) then checks:

{%- if m.content is string -%}
  {{- '<tool_response>' + m.content + '</tool_response>' -}}
{%- else -%}
  {{- '\n' -}}
  {% for tr in m.content %}
    {%- for tool in tools -%}
      {%- if tool.name == tr.name -%}
        {{- tool_to_json(tool) + '\n' -}}
      {%- endif -%}
    {%- endfor -%}
  {%- endfor -%}
  {{- '</tool_response>' -}}
{%- endif -%}

When m.content is a list (not a string):

  • m.content is stringFalse
  • Falls to the else branch, which is designed for structured tool result objects (matching results by tool name)
  • tr = {"type": "text", "text": "15°C, partly cloudy"} — has no .name attribute
  • Template iterates with no matches → produces <tools>\n</tools> artifact instead of actual result

Fix Action

Workaround

Add --chat-template-content-format string to the vLLM server command:

docker run --gpus all -p 8000:8000 --ipc=host \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  vllm/vllm-openai:glm51-cu130 zai-org/GLM-5.1-FP8 \
  --tensor-parallel-size 8 \
  --tool-call-parser glm47 \
  --reasoning-parser glm45 \
  --enable-auto-tool-choice \
  --chat-template-content-format string \
  --served-model-name glm-5.1-fp8

With --chat-template-content-format string, wrap_dicts=False, tool message string content is preserved as a plain string → m.content is string is True → correct <tool_response>15°C, partly cloudy</tool_response> output.

PR fix notes

PR #39630: Fix GLM tool results with auto chat template content format

Description (problem / solution / changelog)

Summary

  • special-case GLM-style chat templates during auto content-format detection
  • keep these templates on the string path when they require raw string tool responses
  • add renderer regressions for format detection and prompt rendering of tool results

Fixes #39614.

Why this is not duplicating an existing PR

  • I checked open PRs that reference #39614 and did not find another open PR for this issue.
  • I also checked nearby GLM-related PRs; the open PR #39253 is about GLM tool parser streaming and does not address prompt rendering of tool result content.

Root cause

resolve_chat_template_content_format(..., given_format="auto", ...) classified certain GLM templates as openai because they iterate over message.content for regular messages. That causes tool-role messages to be converted into OpenAI-style structured content items before rendering. For GLM templates that explicitly expect m.content is string for tool responses, the rendered prompt then falls into the wrong branch and emits an empty <tools> wrapper instead of the actual tool result text.

Tests run

  • python -m py_compile vllm/renderers/hf.py tests/renderers/test_hf.py
  • python -m pytest --noconftest tests/renderers/test_hf.py -k "glm_tool_response_template" -q
    • blocked by local environment mismatch during import: the machine's installed torch is older than current vLLM expects (torch.library.infer_schema is missing)

AI assistance

This PR was prepared with AI assistance. I reviewed the changed files, commit, and PR description before submission.

Changed files

  • tests/renderers/test_hf.py (modified, +140/-1)
  • vllm/renderers/hf.py (modified, +19/-0)

PR #25993: fix(zai): flatten list-format content in tool/assistant messages before sending to GLM

Description (problem / solution / changelog)

Fixes #25868

Problem

GLM's Jinja chat template checks m.content is string and silently drops list-format content (same root cause as vllm-project/vllm#39614). When a client sends tool result content as an OpenAI-format list of content parts — which the official Go openai-go client always does — the tool result is lost and the model responds as if no data was returned.

Working (string format):

{"role": "tool", "tool_call_id": "c1", "content": "22.5°C, partly cloudy."}

Broken (list format — silently dropped by GLM):

{"role": "tool", "tool_call_id": "c1", "content": [{"type": "text", "text": "22.5°C, partly cloudy."}]}

Solution

Add ZAIChatConfig._transform_messages() that normalises list-format content in tool and assistant messages to plain strings before delegating to the parent OpenAIGPTConfig transformer. User messages (which may contain images) are intentionally left untouched.

The new _flatten_content_parts() helper joins text parts with \n and is a no-op for plain strings and None values.

Testing

Added TestZAIMessageTransformation with 7 unit tests covering:

  • Tool message with list-format content → flattened to string
  • Assistant message with list-format content → flattened to string
  • String content → passes through unchanged
  • Multi-part list → joined with newline
  • Empty list → empty string
  • NoneNone

All 16 ZAI provider tests pass.

Changed files

  • litellm/llms/zai/chat/transformation.py (modified, +43/-1)
  • tests/test_litellm/llms/zai/test_zai_provider.py (modified, +114/-0)

Code Example

<|observation|><tool_response><tools>\n</tools></tool_response>

---

curl -s http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-5.1-fp8",
    "messages": [{"role":"user","content":"What is the weather in Vancouver?"}],
    "tools": [{"type":"function","function":{"name":"get_weather","description":"Get weather","parameters":{"type":"object","properties":{"city":{"type":"string"}},"required":["city"]}}}],
    "tool_choice": "auto",
    "chat_template_kwargs": {"enable_thinking": false}
  }' | python3 -c "import json,sys; r=json.load(sys.stdin); print(r['choices'][0]['message']['tool_calls'])"

---

curl -s http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-5.1-fp8",
    "messages": [
      {"role":"user","content":"What is the weather in Vancouver?"},
      {"role":"assistant","content":null,"tool_calls":[{"id":"call_1","type":"function","function":{"name":"get_weather","arguments":"{\"city\":\"Vancouver\"}"}}]},
      {"role":"tool","tool_call_id":"call_1","content":"15°C, partly cloudy"}
    ],
    "tools": [{"type":"function","function":{"name":"get_weather","description":"Get weather","parameters":{"type":"object","properties":{"city":{"type":"string"}},"required":["city"]}}}],
    "chat_template_kwargs": {"enable_thinking": false}
  }' | python3 -c "import json,sys; r=json.load(sys.stdin); print(r['choices'][0]['message']['content'])"

---

curl -s http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-5.1-fp8",
    "prompt": "[gMASK]<sop><|system|>\n# Tools\nYou may call one or more functions to assist with the user query.\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>\n{\"name\": \"get_weather\", \"description\": \"Get weather\", \"parameters\": {\"type\": \"object\", \"properties\": {\"city\": {\"type\": \"string\"}}, \"required\": [\"city\"]}}\n</tools>\nFor each function call, output the function name and arguments within the following XML format:\n<tool_call>{function-name}<arg_key>{arg-key-1}</arg_key><arg_value>{arg-value-1}</arg_value>...</tool_call><|user|>What is the weather in Vancouver?<|assistant|><tool_call>get_weather<arg_key>city</arg_key><arg_value>Vancouver</arg_value></tool_call><|observation|><tool_response>15°C, partly cloudy</tool_response><|assistant|>",
    "max_tokens": 80,
    "temperature": 0.1
  }' | python3 -c "import json,sys; r=json.load(sys.stdin); print(r['choices'][0]['text'])"

---

content = message.get("content")  # "15°C, partly cloudy" (string)

# All string content is converted to content-part list format:
elif isinstance(content, str):
    content = [ChatCompletionContentPartTextParam(type="text", text=content)]
    # → [{"type": "text", "text": "15°C, partly cloudy"}]

result = _parse_chat_message_content_parts(
    role,           # "tool"
    content,        # [{"type": "text", "text": "15°C, partly cloudy"}]
    mm_tracker,
    wrap_dicts=(content_format == "openai"),  # ← True when format="openai"
    ...
)

---

{%- if m.content is string -%}
  {{- '<tool_response>' + m.content + '</tool_response>' -}}
{%- else -%}
  {{- '\n' -}}
  {% for tr in m.content %}
    {%- for tool in tools -%}
      {%- if tool.name == tr.name -%}
        {{- tool_to_json(tool) + '\n' -}}
      {%- endif -%}
    {%- endfor -%}
  {%- endfor -%}
  {{- '</tool_response>' -}}
{%- endif -%}

---

docker run --gpus all -p 8000:8000 --ipc=host \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  vllm/vllm-openai:glm51-cu130 zai-org/GLM-5.1-FP8 \
  --tensor-parallel-size 8 \
  --tool-call-parser glm47 \
  --reasoning-parser glm45 \
  --enable-auto-tool-choice \
  --chat-template-content-format string \
  --served-model-name glm-5.1-fp8

---

elif role == "tool":
    parsed_msg = _ToolParser(message)
    if "tool_call_id" in parsed_msg:
        result_msg["tool_call_id"] = parsed_msg["tool_call_id"]
    # Always preserve string tool result content as-is
    if isinstance(message.get("content"), str):
        result_msg["content"] = message["content"]

---

{%- else -%}
  {%- set text_content = '' -%}
  {%- for part in m.content -%}
    {%- if part.type == 'text' -%}{%- set text_content = text_content + part.text -%}{%- endif -%}
  {%- endfor -%}
  {{- '<tool_response>' + text_content + '</tool_response>' -}}
{%- endif -%}
RAW_BUFFERClick to expand / collapse

Bug: Tool Result Content Corrupted in /v1/chat/completions with --tool-call-parser glm47

Summary

When using /v1/chat/completions with --tool-call-parser glm47 and --chat-template-content-format auto (the default), GLM-5.1-FP8 completely ignores tool results in multi-turn conversations. The model always responds as if the tool returned no data.

Using vLLM's /tokenize + /detokenize endpoints, we confirmed that the actual prompt sent to the model contains:

<|observation|><tool_response><tools>\n</tools></tool_response>

The tool result content ("15°C, partly cloudy") is entirely absent. In its place is the literal string <tools>\n</tools> — the empty tools XML wrapper from the system prompt section.

Environment

vLLM0.19.1.dev1+g43a9b1afb
transformers5.4.0
Docker imagevllm/vllm-openai:glm51-cu130
Modelzai-org/GLM-5.1-FP8
Hardware8× NVIDIA B300 GPUs
Startup flags--tool-call-parser glm47 --reasoning-parser glm45 --enable-auto-tool-choice

Reproduction

Test 1: Outbound tool call — ✅ PASSES

curl -s http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-5.1-fp8",
    "messages": [{"role":"user","content":"What is the weather in Vancouver?"}],
    "tools": [{"type":"function","function":{"name":"get_weather","description":"Get weather","parameters":{"type":"object","properties":{"city":{"type":"string"}},"required":["city"]}}}],
    "tool_choice": "auto",
    "chat_template_kwargs": {"enable_thinking": false}
  }' | python3 -c "import json,sys; r=json.load(sys.stdin); print(r['choices'][0]['message']['tool_calls'])"

Result: [{'id': '...', 'type': 'function', 'function': {'name': 'get_weather', 'arguments': '{"city": "Vancouver"}'}}]


Test 2: Round-trip with tool result — ❌ FAILS

curl -s http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-5.1-fp8",
    "messages": [
      {"role":"user","content":"What is the weather in Vancouver?"},
      {"role":"assistant","content":null,"tool_calls":[{"id":"call_1","type":"function","function":{"name":"get_weather","arguments":"{\"city\":\"Vancouver\"}"}}]},
      {"role":"tool","tool_call_id":"call_1","content":"15°C, partly cloudy"}
    ],
    "tools": [{"type":"function","function":{"name":"get_weather","description":"Get weather","parameters":{"type":"object","properties":{"city":{"type":"string"}},"required":["city"]}}}],
    "chat_template_kwargs": {"enable_thinking": false}
  }' | python3 -c "import json,sys; r=json.load(sys.stdin); print(r['choices'][0]['message']['content'])"

Expected: "The weather in Vancouver is 15°C, partly cloudy."
Actual: "I'm sorry, it seems the weather data for Vancouver couldn't be retrieved at the moment..."

Same failure with enable_thinking: true, content: "", or any other variant.


Test 3: Same conversation via /v1/completions — ✅ PASSES

curl -s http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-5.1-fp8",
    "prompt": "[gMASK]<sop><|system|>\n# Tools\nYou may call one or more functions to assist with the user query.\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>\n{\"name\": \"get_weather\", \"description\": \"Get weather\", \"parameters\": {\"type\": \"object\", \"properties\": {\"city\": {\"type\": \"string\"}}, \"required\": [\"city\"]}}\n</tools>\nFor each function call, output the function name and arguments within the following XML format:\n<tool_call>{function-name}<arg_key>{arg-key-1}</arg_key><arg_value>{arg-value-1}</arg_value>...</tool_call><|user|>What is the weather in Vancouver?<|assistant|><tool_call>get_weather<arg_key>city</arg_key><arg_value>Vancouver</arg_value></tool_call><|observation|><tool_response>15°C, partly cloudy</tool_response><|assistant|>",
    "max_tokens": 80,
    "temperature": 0.1
  }' | python3 -c "import json,sys; r=json.load(sys.stdin); print(r['choices'][0]['text'])"

Result: "The weather in Vancouver is currently **15°C** and **partly cloudy**."

Smoking Gun: Token Count Mismatch

Using /tokenize + /detokenize to inspect the actual prompt vLLM sends to the model:

Request typeprompt_tokensDecoded <tool_response> content
/v1/chat/completions (Test 2)173<tools>\n</tools>
/v1/completions (Test 3)11615°C, partly cloudy

The token IDs [27, 15906, 397, 522, 15906, 29] were confirmed to decode to <tools>\n</tools> via /detokenize. The actual tool result tokens [99082, 30811, 11, 26949, 72978] (= 15°C, partly cloudy) are completely absent from the 173-token prompt.

Root Cause

In vllm/entrypoints/chat_utils.py, _parse_chat_message_content():

content = message.get("content")  # "15°C, partly cloudy" (string)

# All string content is converted to content-part list format:
elif isinstance(content, str):
    content = [ChatCompletionContentPartTextParam(type="text", text=content)]
    # → [{"type": "text", "text": "15°C, partly cloudy"}]

result = _parse_chat_message_content_parts(
    role,           # "tool"
    content,        # [{"type": "text", "text": "15°C, partly cloudy"}]
    mm_tracker,
    wrap_dicts=(content_format == "openai"),  # ← True when format="openai"
    ...
)

The "auto" content format resolves to "openai" for GLM-4-MoE models (likely because glm4_moe is registered as a multimodal architecture). This causes wrap_dicts=True, and the ConversationMessage.content for tool role messages becomes a list [{"type": "text", "text": "15°C, partly cloudy"}] instead of a plain string.

The GLM-5.1-FP8 chat template (chat_template.jinja) then checks:

{%- if m.content is string -%}
  {{- '<tool_response>' + m.content + '</tool_response>' -}}
{%- else -%}
  {{- '\n' -}}
  {% for tr in m.content %}
    {%- for tool in tools -%}
      {%- if tool.name == tr.name -%}
        {{- tool_to_json(tool) + '\n' -}}
      {%- endif -%}
    {%- endfor -%}
  {%- endfor -%}
  {{- '</tool_response>' -}}
{%- endif -%}

When m.content is a list (not a string):

  • m.content is stringFalse
  • Falls to the else branch, which is designed for structured tool result objects (matching results by tool name)
  • tr = {"type": "text", "text": "15°C, partly cloudy"} — has no .name attribute
  • Template iterates with no matches → produces <tools>\n</tools> artifact instead of actual result

Workaround

Add --chat-template-content-format string to the vLLM server command:

docker run --gpus all -p 8000:8000 --ipc=host \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  vllm/vllm-openai:glm51-cu130 zai-org/GLM-5.1-FP8 \
  --tensor-parallel-size 8 \
  --tool-call-parser glm47 \
  --reasoning-parser glm45 \
  --enable-auto-tool-choice \
  --chat-template-content-format string \
  --served-model-name glm-5.1-fp8

With --chat-template-content-format string, wrap_dicts=False, tool message string content is preserved as a plain string → m.content is string is True → correct <tool_response>15°C, partly cloudy</tool_response> output.

Proposed Fixes

Option A — chat_utils.py (most robust, affects all models)

Preserve tool-role message string content as a plain string regardless of content_format:

elif role == "tool":
    parsed_msg = _ToolParser(message)
    if "tool_call_id" in parsed_msg:
        result_msg["tool_call_id"] = parsed_msg["tool_call_id"]
    # Always preserve string tool result content as-is
    if isinstance(message.get("content"), str):
        result_msg["content"] = message["content"]

Option B — chat_template.jinja (model-side fix)

Handle OpenAI-format list content in the tool message else branch:

{%- else -%}
  {%- set text_content = '' -%}
  {%- for part in m.content -%}
    {%- if part.type == 'text' -%}{%- set text_content = text_content + part.text -%}{%- endif -%}
  {%- endfor -%}
  {{- '<tool_response>' + text_content + '</tool_response>' -}}
{%- endif -%}

Option C — Change default for GLM-4-MoE

Change "auto" to resolve to "string" for glm4_moe architecture in the model registry, since these models use a text-based tool format rather than OpenAI multimodal content lists.

extent analysis

TL;DR

The issue can be fixed by adding --chat-template-content-format string to the vLLM server command to preserve tool message string content as a plain string.

Guidance

  • Verify that the --chat-template-content-format string flag is applied correctly by checking the vLLM server command and ensuring it includes this flag.
  • Test the fix using the provided Test 2 reproduction steps to confirm that the tool result content is correctly included in the prompt.
  • Consider implementing one of the proposed fixes (Option A, B, or C) for a more robust solution that affects all models or handles OpenAI-format list content.
  • Review the model registry to determine if changing the default content format for GLM-4-MoE models is necessary.

Example

No code snippet is provided as the issue is resolved through a command-line flag or proposed code changes.

Notes

The provided workaround and proposed fixes assume that the issue is specific to the GLM-5.1-FP8 model and the --tool-call-parser glm47 configuration. Additional testing may be necessary to ensure the fix does not introduce regressions in other models or configurations.

Recommendation

Apply the workaround by adding --chat-template-content-format string to the vLLM server command, as it is a simple and effective solution to the issue. If a more robust fix is desired, consider implementing Option A, which preserves tool-role message string content as a plain string regardless of content_format.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - ✅(Solved) Fix [Bug] GLM-5.1-FP8: tool result content replaced with `<tools>\n</tools>` in /v1/chat/completions when --chat-template-content-format is auto [2 pull requests, 1 participants]