vllm - ✅(Solved) Fix [Bug] GLM-5.1-FP8: tool result content replaced with `<tools>\n</tools>` in /v1/chat/completions when --chat-template-content-format is auto [2 pull requests, 1 participants]

vllm2026-04-12 07:01:35

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#39614•Fetched 2026-04-12 13:24:21

View on GitHub

Comments

Participants

Timeline

Reactions

Author

codoublez

Participants

codoublez

When using /v1/chat/completions with --tool-call-parser glm47 and --chat-template-content-format auto (the default), GLM-5.1-FP8 completely ignores tool results in multi-turn conversations. The model always responds as if the tool returned no data.

Using vLLM's /tokenize + /detokenize endpoints, we confirmed that the actual prompt sent to the model contains:

<|observation|><tool_response><tools>\n</tools></tool_response>

The tool result content ("15°C, partly cloudy") is entirely absent. In its place is the literal string <tools>\n</tools> — the empty tools XML wrapper from the system prompt section.

Root Cause

In vllm/entrypoints/chat_utils.py, _parse_chat_message_content():

content = message.get("content")  # "15°C, partly cloudy" (string)

# All string content is converted to content-part list format:
elif isinstance(content, str):
    content = [ChatCompletionContentPartTextParam(type="text", text=content)]
    # → [{"type": "text", "text": "15°C, partly cloudy"}]

result = _parse_chat_message_content_parts(
    role,           # "tool"
    content,        # [{"type": "text", "text": "15°C, partly cloudy"}]
    mm_tracker,
    wrap_dicts=(content_format == "openai"),  # ← True when format="openai"
    ...
)

The "auto" content format resolves to "openai" for GLM-4-MoE models (likely because glm4_moe is registered as a multimodal architecture). This causes wrap_dicts=True, and the ConversationMessage.content for tool role messages becomes a list [{"type": "text", "text": "15°C, partly cloudy"}] instead of a plain string.

The GLM-5.1-FP8 chat template (chat_template.jinja) then checks:

{%- if m.content is string -%}
  {{- '<tool_response>' + m.content + '</tool_response>' -}}
{%- else -%}
  {{- '\n' -}}
  {% for tr in m.content %}
    {%- for tool in tools -%}
      {%- if tool.name == tr.name -%}
        {{- tool_to_json(tool) + '\n' -}}
      {%- endif -%}
    {%- endfor -%}
  {%- endfor -%}
  {{- '</tool_response>' -}}
{%- endif -%}

When m.content is a list (not a string):

m.content is string → False
Falls to the else branch, which is designed for structured tool result objects (matching results by tool name)
tr = {"type": "text", "text": "15°C, partly cloudy"} — has no .name attribute
Template iterates with no matches → produces <tools>\n</tools> artifact instead of actual result

Fix Action

Workaround

Add --chat-template-content-format string to the vLLM server command:

docker run --gpus all -p 8000:8000 --ipc=host \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  vllm/vllm-openai:glm51-cu130 zai-org/GLM-5.1-FP8 \
  --tensor-parallel-size 8 \
  --tool-call-parser glm47 \
  --reasoning-parser glm45 \
  --enable-auto-tool-choice \
  --chat-template-content-format string \
  --served-model-name glm-5.1-fp8

With --chat-template-content-format string, wrap_dicts=False, tool message string content is preserved as a plain string → m.content is string is True → correct <tool_response>15°C, partly cloudy</tool_response> output.

PR fix notes

PR #39630: Fix GLM tool results with auto chat template content format

Repository: vllm-project/vllm
Author: Yonsun-w
State: open | merged: False
Link: https://github.com/vllm-project/vllm/pull/39630

Description (problem / solution / changelog)

Summary

special-case GLM-style chat templates during auto content-format detection
keep these templates on the string path when they require raw string tool responses
add renderer regressions for format detection and prompt rendering of tool results

Fixes #39614.

Why this is not duplicating an existing PR

I checked open PRs that reference #39614 and did not find another open PR for this issue.
I also checked nearby GLM-related PRs; the open PR #39253 is about GLM tool parser streaming and does not address prompt rendering of tool result content.

Root cause

resolve_chat_template_content_format(..., given_format="auto", ...) classified certain GLM templates as openai because they iterate over message.content for regular messages. That causes tool-role messages to be converted into OpenAI-style structured content items before rendering. For GLM templates that explicitly expect m.content is string for tool responses, the rendered prompt then falls into the wrong branch and emits an empty <tools> wrapper instead of the actual tool result text.

Tests run

python -m py_compile vllm/renderers/hf.py tests/renderers/test_hf.py ✅
python -m pytest --noconftest tests/renderers/test_hf.py -k "glm_tool_response_template" -q ❌
- blocked by local environment mismatch during import: the machine's installed torch is older than current vLLM expects (torch.library.infer_schema is missing)

AI assistance

This PR was prepared with AI assistance. I reviewed the changed files, commit, and PR description before submission.

Changed files

tests/renderers/test_hf.py (modified, +140/-1)
vllm/renderers/hf.py (modified, +19/-0)

PR #25993: fix(zai): flatten list-format content in tool/assistant messages before sending to GLM

Repository: BerriAI/litellm
Author: octo-patch
State: open | merged: False
Link: https://github.com/BerriAI/litellm/pull/25993

Description (problem / solution / changelog)

Fixes #25868

Problem

GLM's Jinja chat template checks m.content is string and silently drops list-format content (same root cause as vllm-project/vllm#39614). When a client sends tool result content as an OpenAI-format list of content parts — which the official Go openai-go client always does — the tool result is lost and the model responds as if no data was returned.

Working (string format):

{"role": "tool", "tool_call_id": "c1", "content": "22.5°C, partly cloudy."}

Broken (list format — silently dropped by GLM):

{"role": "tool", "tool_call_id": "c1", "content": [{"type": "text", "text": "22.5°C, partly cloudy."}]}

Solution

Add ZAIChatConfig._transform_messages() that normalises list-format content in tool and assistant messages to plain strings before delegating to the parent OpenAIGPTConfig transformer. User messages (which may contain images) are intentionally left untouched.

The new _flatten_content_parts() helper joins text parts with \n and is a no-op for plain strings and None values.

Testing

Added TestZAIMessageTransformation with 7 unit tests covering:

Tool message with list-format content → flattened to string
Assistant message with list-format content → flattened to string
String content → passes through unchanged
Multi-part list → joined with newline
Empty list → empty string
None → None

All 16 ZAI provider tests pass.

Changed files

litellm/llms/zai/chat/transformation.py (modified, +43/-1)
tests/test_litellm/llms/zai/test_zai_provider.py (modified, +114/-0)

Code Example

<|observation|><tool_response><tools>\n</tools></tool_response>

---

curl -s http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-5.1-fp8",
    "messages": [{"role":"user","content":"What is the weather in Vancouver?"}],
    "tools": [{"type":"function","function":{"name":"get_weather","description":"Get weather","parameters":{"type":"object","properties":{"city":{"type":"string"}},"required":["city"]}}}],
    "tool_choice": "auto",
    "chat_template_kwargs": {"enable_thinking": false}
  }' | python3 -c "import json,sys; r=json.load(sys.stdin); print(r['choices'][0]['message']['tool_calls'])"

---

curl -s http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-5.1-fp8",
    "messages": [
      {"role":"user","content":"What is the weather in Vancouver?"},
      {"role":"assistant","content":null,"tool_calls":[{"id":"call_1","type":"function","function":{"name":"get_weather","arguments":"{\"city\":\"Vancouver\"}"}}]},
      {"role":"tool","tool_call_id":"call_1","content":"15°C, partly cloudy"}
    ],
    "tools": [{"type":"function","function":{"name":"get_weather","description":"Get weather","parameters":{"type":"object","properties":{"city":{"type":"string"}},"required":["city"]}}}],
    "chat_template_kwargs": {"enable_thinking": false}
  }' | python3 -c "import json,sys; r=json.load(sys.stdin); print(r['choices'][0]['message']['content'])"

---

curl -s http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-5.1-fp8",
    "prompt": "[gMASK]<sop><|system|>\n# Tools\nYou may call one or more functions to assist with the user query.\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>\n{\"name\": \"get_weather\", \"description\": \"Get weather\", \"parameters\": {\"type\": \"object\", \"properties\": {\"city\": {\"type\": \"string\"}}, \"required\": [\"city\"]}}\n</tools>\nFor each function call, output the function name and arguments within the following XML format:\n<tool_call>{function-name}<arg_key>{arg-key-1}</arg_key><arg_value>{arg-value-1}</arg_value>...</tool_call><|user|>What is the weather in Vancouver?<|assistant|><tool_call>get_weather<arg_key>city</arg_key><arg_value>Vancouver</arg_value></tool_call><|observation|><tool_response>15°C, partly cloudy</tool_response><|assistant|>",
    "max_tokens": 80,
    "temperature": 0.1
  }' | python3 -c "import json,sys; r=json.load(sys.stdin); print(r['choices'][0]['text'])"

---

content = message.get("content")  # "15°C, partly cloudy" (string)

# All string content is converted to content-part list format:
elif isinstance(content, str):
    content = [ChatCompletionContentPartTextParam(type="text", text=content)]
    # → [{"type": "text", "text": "15°C, partly cloudy"}]

result = _parse_chat_message_content_parts(
    role,           # "tool"
    content,        # [{"type": "text", "text": "15°C, partly cloudy"}]
    mm_tracker,
    wrap_dicts=(content_format == "openai"),  # ← True when format="openai"
    ...
)

---

{%- if m.content is string -%}
  {{- '<tool_response>' + m.content + '</tool_response>' -}}
{%- else -%}
  {{- '\n' -}}
  {% for tr in m.content %}
    {%- for tool in tools -%}
      {%- if tool.name == tr.name -%}
        {{- tool_to_json(tool) + '\n' -}}
      {%- endif -%}
    {%- endfor -%}
  {%- endfor -%}
  {{- '</tool_response>' -}}
{%- endif -%}

---

docker run --gpus all -p 8000:8000 --ipc=host \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  vllm/vllm-openai:glm51-cu130 zai-org/GLM-5.1-FP8 \
  --tensor-parallel-size 8 \
  --tool-call-parser glm47 \
  --reasoning-parser glm45 \
  --enable-auto-tool-choice \
  --chat-template-content-format string \
  --served-model-name glm-5.1-fp8

---

elif role == "tool":
    parsed_msg = _ToolParser(message)
    if "tool_call_id" in parsed_msg:
        result_msg["tool_call_id"] = parsed_msg["tool_call_id"]
    # Always preserve string tool result content as-is
    if isinstance(message.get("content"), str):
        result_msg["content"] = message["content"]

---

{%- else -%}
  {%- set text_content = '' -%}
  {%- for part in m.content -%}
    {%- if part.type == 'text' -%}{%- set text_content = text_content + part.text -%}{%- endif -%}
  {%- endfor -%}
  {{- '<tool_response>' + text_content + '</tool_response>' -}}
{%- endif -%}

RAW_BUFFERClick to expand / collapse

Bug: Tool Result Content Corrupted in `/v1/chat/completions` with `--tool-call-parser glm47`

Summary

Using vLLM's /tokenize + /detokenize endpoints, we confirmed that the actual prompt sent to the model contains:

<|observation|><tool_response><tools>\n</tools></tool_response>

The tool result content ("15°C, partly cloudy") is entirely absent. In its place is the literal string <tools>\n</tools> — the empty tools XML wrapper from the system prompt section.

Environment


vLLM	`0.19.1.dev1+g43a9b1afb`
transformers	`5.4.0`
Docker image	`vllm/vllm-openai:glm51-cu130`
Model	`zai-org/GLM-5.1-FP8`
Hardware	8× NVIDIA B300 GPUs
Startup flags	`--tool-call-parser glm47 --reasoning-parser glm45 --enable-auto-tool-choice`

Reproduction

Test 1: Outbound tool call — ✅ PASSES

curl -s http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-5.1-fp8",
    "messages": [{"role":"user","content":"What is the weather in Vancouver?"}],
    "tools": [{"type":"function","function":{"name":"get_weather","description":"Get weather","parameters":{"type":"object","properties":{"city":{"type":"string"}},"required":["city"]}}}],
    "tool_choice": "auto",
    "chat_template_kwargs": {"enable_thinking": false}
  }' | python3 -c "import json,sys; r=json.load(sys.stdin); print(r['choices'][0]['message']['tool_calls'])"

Result: [{'id': '...', 'type': 'function', 'function': {'name': 'get_weather', 'arguments': '{"city": "Vancouver"}'}}] ✅

Test 2: Round-trip with tool result — ❌ FAILS

curl -s http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-5.1-fp8",
    "messages": [
      {"role":"user","content":"What is the weather in Vancouver?"},
      {"role":"assistant","content":null,"tool_calls":[{"id":"call_1","type":"function","function":{"name":"get_weather","arguments":"{\"city\":\"Vancouver\"}"}}]},
      {"role":"tool","tool_call_id":"call_1","content":"15°C, partly cloudy"}
    ],
    "tools": [{"type":"function","function":{"name":"get_weather","description":"Get weather","parameters":{"type":"object","properties":{"city":{"type":"string"}},"required":["city"]}}}],
    "chat_template_kwargs": {"enable_thinking": false}
  }' | python3 -c "import json,sys; r=json.load(sys.stdin); print(r['choices'][0]['message']['content'])"

Expected: "The weather in Vancouver is 15°C, partly cloudy."
Actual: "I'm sorry, it seems the weather data for Vancouver couldn't be retrieved at the moment..." ❌

Same failure with enable_thinking: true, content: "", or any other variant.

Test 3: Same conversation via /v1/completions — ✅ PASSES

curl -s http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "glm-5.1-fp8",
    "prompt": "[gMASK]<sop><|system|>\n# Tools\nYou may call one or more functions to assist with the user query.\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>\n{\"name\": \"get_weather\", \"description\": \"Get weather\", \"parameters\": {\"type\": \"object\", \"properties\": {\"city\": {\"type\": \"string\"}}, \"required\": [\"city\"]}}\n</tools>\nFor each function call, output the function name and arguments within the following XML format:\n<tool_call>{function-name}<arg_key>{arg-key-1}</arg_key><arg_value>{arg-value-1}</arg_value>...</tool_call><|user|>What is the weather in Vancouver?<|assistant|><tool_call>get_weather<arg_key>city</arg_key><arg_value>Vancouver</arg_value></tool_call><|observation|><tool_response>15°C, partly cloudy</tool_response><|assistant|>",
    "max_tokens": 80,
    "temperature": 0.1
  }' | python3 -c "import json,sys; r=json.load(sys.stdin); print(r['choices'][0]['text'])"

Result: "The weather in Vancouver is currently **15°C** and **partly cloudy**." ✅

Smoking Gun: Token Count Mismatch

Using /tokenize + /detokenize to inspect the actual prompt vLLM sends to the model:

Request type	`prompt_tokens`	Decoded `<tool_response>` content
`/v1/chat/completions` (Test 2)	173	`<tools>\n</tools>` ❌
`/v1/completions` (Test 3)	116	`15°C, partly cloudy` ✅

The token IDs [27, 15906, 397, 522, 15906, 29] were confirmed to decode to <tools>\n</tools> via /detokenize. The actual tool result tokens [99082, 30811, 11, 26949, 72978] (= 15°C, partly cloudy) are completely absent from the 173-token prompt.

Root Cause

In vllm/entrypoints/chat_utils.py, _parse_chat_message_content():

content = message.get("content")  # "15°C, partly cloudy" (string)

# All string content is converted to content-part list format:
elif isinstance(content, str):
    content = [ChatCompletionContentPartTextParam(type="text", text=content)]
    # → [{"type": "text", "text": "15°C, partly cloudy"}]

result = _parse_chat_message_content_parts(
    role,           # "tool"
    content,        # [{"type": "text", "text": "15°C, partly cloudy"}]
    mm_tracker,
    wrap_dicts=(content_format == "openai"),  # ← True when format="openai"
    ...
)

The GLM-5.1-FP8 chat template (chat_template.jinja) then checks:

{%- if m.content is string -%}
  {{- '<tool_response>' + m.content + '</tool_response>' -}}
{%- else -%}
  {{- '\n' -}}
  {% for tr in m.content %}
    {%- for tool in tools -%}
      {%- if tool.name == tr.name -%}
        {{- tool_to_json(tool) + '\n' -}}
      {%- endif -%}
    {%- endfor -%}
  {%- endfor -%}
  {{- '</tool_response>' -}}
{%- endif -%}

When m.content is a list (not a string):

m.content is string → False
Falls to the else branch, which is designed for structured tool result objects (matching results by tool name)
tr = {"type": "text", "text": "15°C, partly cloudy"} — has no .name attribute
Template iterates with no matches → produces <tools>\n</tools> artifact instead of actual result

Workaround

Add --chat-template-content-format string to the vLLM server command:

docker run --gpus all -p 8000:8000 --ipc=host \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  vllm/vllm-openai:glm51-cu130 zai-org/GLM-5.1-FP8 \
  --tensor-parallel-size 8 \
  --tool-call-parser glm47 \
  --reasoning-parser glm45 \
  --enable-auto-tool-choice \
  --chat-template-content-format string \
  --served-model-name glm-5.1-fp8

Proposed Fixes

Option A — `chat_utils.py` (most robust, affects all models)

Preserve tool-role message string content as a plain string regardless of content_format:

elif role == "tool":
    parsed_msg = _ToolParser(message)
    if "tool_call_id" in parsed_msg:
        result_msg["tool_call_id"] = parsed_msg["tool_call_id"]
    # Always preserve string tool result content as-is
    if isinstance(message.get("content"), str):
        result_msg["content"] = message["content"]

Option B — `chat_template.jinja` (model-side fix)

Handle OpenAI-format list content in the tool message else branch:

{%- else -%}
  {%- set text_content = '' -%}
  {%- for part in m.content -%}
    {%- if part.type == 'text' -%}{%- set text_content = text_content + part.text -%}{%- endif -%}
  {%- endfor -%}
  {{- '<tool_response>' + text_content + '</tool_response>' -}}
{%- endif -%}

Option C — Change default for GLM-4-MoE

Change "auto" to resolve to "string" for glm4_moe architecture in the model registry, since these models use a text-based tool format rather than OpenAI multimodal content lists.

extent analysis

TL;DR

The issue can be fixed by adding --chat-template-content-format string to the vLLM server command to preserve tool message string content as a plain string.

Guidance

Verify that the --chat-template-content-format string flag is applied correctly by checking the vLLM server command and ensuring it includes this flag.
Test the fix using the provided Test 2 reproduction steps to confirm that the tool result content is correctly included in the prompt.
Consider implementing one of the proposed fixes (Option A, B, or C) for a more robust solution that affects all models or handles OpenAI-format list content.
Review the model registry to determine if changing the default content format for GLM-4-MoE models is necessary.

Example

No code snippet is provided as the issue is resolved through a command-line flag or proposed code changes.

Notes

The provided workaround and proposed fixes assume that the issue is specific to the GLM-5.1-FP8 model and the --tool-call-parser glm47 configuration. Additional testing may be necessary to ensure the fix does not introduce regressions in other models or configurations.

Recommendation

Apply the workaround by adding --chat-template-content-format string to the vLLM server command, as it is a simple and effective solution to the issue. If a more robust fix is desired, consider implementing Option A, which preserves tool-role message string content as a plain string regardless of content_format.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#inference speed #output truncation #response parsing #generation error #database connection

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

vllm - ✅(Solved) Fix [Bug] GLM-5.1-FP8: tool result content replaced with `<tools>\n</tools>` in /v1/chat/completions when --chat-template-content-format is auto [2 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Workaround

PR fix notes

PR #39630: Fix GLM tool results with auto chat template content format

Description (problem / solution / changelog)

Summary

Why this is not duplicating an existing PR

Root cause

Tests run

AI assistance

Changed files

PR #25993: fix(zai): flatten list-format content in tool/assistant messages before sending to GLM

Description (problem / solution / changelog)

Problem

Solution

Testing

Changed files

Code Example

Bug: Tool Result Content Corrupted in /v1/chat/completions with --tool-call-parser glm47

Summary

Environment

Reproduction

Test 1: Outbound tool call — ✅ PASSES

Test 2: Round-trip with tool result — ❌ FAILS

Test 3: Same conversation via /v1/completions — ✅ PASSES

Smoking Gun: Token Count Mismatch

Root Cause

Workaround

Proposed Fixes

Option A — chat_utils.py (most robust, affects all models)

Option B — chat_template.jinja (model-side fix)

Option C — Change default for GLM-4-MoE

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING

Bug: Tool Result Content Corrupted in `/v1/chat/completions` with `--tool-call-parser glm47`

Option A — `chat_utils.py` (most robust, affects all models)

Option B — `chat_template.jinja` (model-side fix)