openclaw - 💡(How to fix) Fix Qwen 2.5 Coder 32B via llama.cpp: tool calls emitted as plain text, not structured tool_calls [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#60601Fetched 2026-04-08 02:49:18
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Participants

Root Cause

Qwen 2.5 Coder 32B (at 32B param size) does not reliably wrap tool calls in <tool_call> XML tags when tool_choice is unset or "auto". Instead it outputs:

  • Bare JSON: {"name": "read", "arguments": {"path": "..."}}
  • Wrong XML tags: <tools>{"name": "read", ...}</tools> (instead of <tool_call>)

Because llama-server's tool call parser looks for <tool_call> tags specifically, these variants are not converted into structured tool_calls in the API response. They end up in message.content as plain text, and OpenClaw's buildAssistantMessage only processes response.message.tool_calls.

With "tool_choice": "required" the model does produce proper <tool_call> tags and llama-server returns structured tool_calls correctly. But OpenClaw does not send tool_choice unless explicitly configured.

Fix Action

Workaround

I wrote a reverse proxy that sits between OpenClaw and llama-server. It buffers streaming responses, detects tool call patterns in the accumulated text content (bare JSON, <tools>, <tool_call> tags), and re-emits a corrected SSE stream with proper tool_calls structure. This works but shouldn't be necessary.

Code Example

Sure, let's try again to check the config file.

{"name": "read", "arguments": {"path": "cow_trader/config.py"}}
RAW_BUFFERClick to expand / collapse

Qwen 2.5 Coder 32B via llama.cpp: tool calls emitted as plain text, not structured tool_calls

Bug Description

When using Qwen 2.5 Coder 32B (Q4_K_M GGUF) via llama.cpp's OpenAI-compatible API (openai-completions), OpenClaw does not detect or execute tool calls. The model outputs tool calls as plain JSON text in the content field instead of OpenClaw receiving them as structured tool_calls objects.

Environment

  • OpenClaw: 2026.4.1
  • Model: qwen2.5-coder-32b-instruct-q4_k_m.gguf via llama.cpp (build 8638)
  • API: openai-completions
  • llama-server flags: --jinja enabled
  • OS: Linux (Ubuntu 24.04, Docker)

Steps to Reproduce

  1. Configure a llamacpp provider in openclaw.json with "api": "openai-completions" pointing at llama-server
  2. Start llama-server with --jinja and the Qwen 2.5 Coder 32B GGUF
  3. Send a message that requires tool use (e.g., "read cow_trader/config.py")

Expected Behavior

OpenClaw should detect the tool call and execute the read tool.

Actual Behavior

The model's response appears as plain text in the chat:

Sure, let's try again to check the config file.

{"name": "read", "arguments": {"path": "cow_trader/config.py"}}

No tool is executed. The tool call JSON is rendered as text to the user.

Root Cause

Qwen 2.5 Coder 32B (at 32B param size) does not reliably wrap tool calls in <tool_call> XML tags when tool_choice is unset or "auto". Instead it outputs:

  • Bare JSON: {"name": "read", "arguments": {"path": "..."}}
  • Wrong XML tags: <tools>{"name": "read", ...}</tools> (instead of <tool_call>)

Because llama-server's tool call parser looks for <tool_call> tags specifically, these variants are not converted into structured tool_calls in the API response. They end up in message.content as plain text, and OpenClaw's buildAssistantMessage only processes response.message.tool_calls.

With "tool_choice": "required" the model does produce proper <tool_call> tags and llama-server returns structured tool_calls correctly. But OpenClaw does not send tool_choice unless explicitly configured.

Workaround

I wrote a reverse proxy that sits between OpenClaw and llama-server. It buffers streaming responses, detects tool call patterns in the accumulated text content (bare JSON, <tools>, <tool_call> tags), and re-emits a corrected SSE stream with proper tool_calls structure. This works but shouldn't be necessary.

Suggested Fix

One or more of these could address it:

  1. Allow per-model toolChoice config -- Add a compat.defaultToolChoice field to the model definition schema so users can set "required" for models that need it.
  2. Content-based tool call fallback -- When a model returns finish_reason: "stop" but the content contains JSON matching the {"name": "...", "arguments": {...}} tool call pattern (especially wrapped in <tool_call> or <tools> tags), parse and promote them to tool_calls. This would help with many small/local models.
  3. Send tool_choice: "auto" by default -- Some models behave better when this is explicit rather than omitted, though this alone doesn't fix Qwen 2.5 Coder 32B.

Option 2 would be the most broadly useful, as many local models (especially smaller ones) have inconsistent tool call formatting.

extent analysis

TL;DR

Implement a content-based tool call fallback to parse and promote JSON tool calls in the response content to structured tool_calls.

Guidance

  • Investigate adding a compat.defaultToolChoice field to the model definition schema to allow per-model toolChoice configuration.
  • Consider implementing a fallback to parse JSON tool calls in the response content when the model returns finish_reason: "stop".
  • Evaluate sending tool_choice: "auto" by default to improve model behavior, although this may not fix the issue with Qwen 2.5 Coder 32B.

Example

No explicit code example is provided, but the suggested fix involves modifying the model definition schema or the tool call parsing logic.

Notes

The issue is specific to Qwen 2.5 Coder 32B and may not apply to other models. The suggested fixes aim to improve the robustness of tool call detection and parsing.

Recommendation

Apply a workaround, specifically implementing a content-based tool call fallback, as it would be the most broadly useful solution, helping with many small/local models.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING