openclaw - 💡(How to fix) Fix Tool-call regression — agents return prose ('I will search…') instead of emitting tool calls [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#80150Fetched 2026-05-11 03:18:21
View on GitHub
Comments
1
Participants
2
Timeline
3
Reactions
2
Author
Timeline (top)
closed ×1commented ×1cross-referenced ×1

Across all locally-served LLMs in our deployment, agents respond to prompts that should trigger tool invocation by describing what they would do rather than actually calling the tool. The pattern is consistent enough that it appears to be a gateway-level tool-advertisement bug, not a per-model behavior difference.

Empirically: prompts like "Search the web for X and tell me what you find" (action_now in our bench) routinely produce replies starting with "I'll search for X…", "Let me look up Y…", or "I would need to call the search tool to find Z…" — but no tool call is emitted, and the user gets only the announcement, not the result.

Root Cause

  • All persona work that relies on action-over-announce behavior (Tier 1 #1, #6, #7, #20, ad-hoc) is partly theatre because the bot can't take the actions the persona describes.
  • Bench prompts that test tool-using behavior (action_now, delegate_data_task, morning_briefing) cap out at ~33–67% pass rates regardless of model class.
  • Daily cron skills (chart-of-the-day, morning-briefing) that should call MCPs to fetch real data instead pull from training-data recency, producing stale or fabricated content.

Fix Action

Fix / Workaround

Workaround in place

Code Example

[bench] prompt action_now → reply
> I'll search the web for what's new in OpenClaw in May 2026 and report back
> with concrete findings. Let me check the official release notes…

[no tool call recorded; turn ends]

---

openclaw agent --agent main --session-id $(uuidgen) \
  --message 'Search the web for what is new in OpenClaw in May 2026 and tell me what you find. Reply with concrete findings — URLs, dates, version numbers.' \
  --json --timeout 300
RAW_BUFFERClick to expand / collapse

Upstream issue draft — OpenClaw

Repo: https://github.com/openclaw/openclaw/issues/new Title: Tool-call regression — agents return prose ("I will search…") instead of emitting tool calls

Summary

Across all locally-served LLMs in our deployment, agents respond to prompts that should trigger tool invocation by describing what they would do rather than actually calling the tool. The pattern is consistent enough that it appears to be a gateway-level tool-advertisement bug, not a per-model behavior difference.

Empirically: prompts like "Search the web for X and tell me what you find" (action_now in our bench) routinely produce replies starting with "I'll search for X…", "Let me look up Y…", or "I would need to call the search tool to find Z…" — but no tool call is emitted, and the user gets only the announcement, not the result.

Symptom

[bench] prompt action_now → reply
> I'll search the web for what's new in OpenClaw in May 2026 and report back
> with concrete findings. Let me check the official release notes…

[no tool call recorded; turn ends]

This pattern is reproduced across:

  • qwen2.5:14b (default)
  • qwen3-fast:30b (MoE)
  • qwen3-think:30b (chain-of-thought)
  • qwen2.5:14b-instruct-q5_K_M / q8_0 (quantization variants)

The persona files (SOUL.md, AGENTS.md) all contain "action over explanation" rules with concrete examples of the desired shape. The rules have been validated to flip behavior in 1–2 iterations on qwen2.5:14b for some tools (Tier 1 #6 delegate_data_task shipped 2026-05-08 at 100% on the configured model). But for native-tool calls (web_search, exec, read), the regression persists.

Hypothesis

The gateway tool-advertisement layer may not be advertising the available tool set to the model in a way that triggers the model's tool-calling behavior. Things to verify:

  1. Does the system prompt actually list the available tools for the agent?
  2. Is the model instructed (in OpenClaw's framing) that tools are available to call, not just describable?
  3. For models with native tool-calling format (Qwen 2.5 family supports OpenAI-style function calls), is the gateway sending the tool schemas in the right format?

Reproduction

openclaw agent --agent main --session-id $(uuidgen) \
  --message 'Search the web for what is new in OpenClaw in May 2026 and tell me what you find. Reply with concrete findings — URLs, dates, version numbers.' \
  --json --timeout 300

Expected: agent emits a web_search tool call, gateway resolves it, model summarizes results. Actual: agent's reply is prose announcing what it would do, no tool call recorded in the gateway turn log.

What this blocks

This is the single largest UX bug in the deployment. Until it ships:

  • All persona work that relies on action-over-announce behavior (Tier 1 #1, #6, #7, #20, ad-hoc) is partly theatre because the bot can't take the actions the persona describes.
  • Bench prompts that test tool-using behavior (action_now, delegate_data_task, morning_briefing) cap out at ~33–67% pass rates regardless of model class.
  • Daily cron skills (chart-of-the-day, morning-briefing) that should call MCPs to fetch real data instead pull from training-data recency, producing stale or fabricated content.

Environment

  • openclaw 2026.5.2 (8b2a6e5)
  • macOS 25.4.0 (Darwin 25.4.0) / arm64 / Mac Mini M4 Pro 64 GB
  • Models tested: qwen2.5:14b (Q4 default), qwen3-fast:30b, qwen3-think:30b, qwen2.5:14b-q5_K_M, qwen2.5:14b-q8_0

Related

Pairs with openclaw-mcp-runtime-disposed.md (this repo, separate issue draft) — together these two issues account for the bulk of locally-tracked Tier 0 work in docs/18-chatbot-improvement-plan.md. Tracked there as Tier 0 #7.

What I'd expect

For prompts that should invoke a tool, the gateway turn log shows a tool_call event followed by a tool_result event followed by the model's final reply that incorporates the tool result.

Workaround in place

None local. We've tuned persona text exhaustively (concrete examples, verbatim failing-phrase callouts, "do not announce; just call the tool" rules) and the regression persists. The pattern survives across model classes, which is what convinces me this is a gateway-side advertisement issue rather than a model-personality issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Tool-call regression — agents return prose ('I will search…') instead of emitting tool calls [1 comments, 2 participants]