openclaw - 💡(How to fix) Fix [Bug] Agent tool-use regressed for all providers between v2026.4.20 and main — 0/6 models invoke tools on a cold-start prompt; multiple model families converge on fabricated "Exec is blocked on this session" phrase [6 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#70456Fetched 2026-04-24 05:57:51
View on GitHub
Comments
6
Participants
3
Timeline
15
Reactions
0
Author
Timeline (top)
commented ×6mentioned ×4subscribed ×4closed ×1

Between v2026.4.20 (working) and current origin/main (1c91f3f, broken), OpenClaw's agent has stopped invoking skill tools. The regression is not model-specific, not provider-specific, and not skill-specific — it reproduces across every provider/model we tested on current main, for any skill that exposes shell-based tools emitting structured output blocks.

Root Cause

ModeLatencySourceDescription
empty_output5–13 sOpenClaw wrapperpi-embedded-runner/run.ts:1832 emits ⚠️ Agent couldn't generate a response. Please try again. when the turn completes with reasoningOnlyRetriesExhausted && !finalAssistantVisibleTextProvider call is made, model streams tokens, but no assistant-visible text and no tool_use block arrives by turn end. OpenClaw's incomplete-turn handler emits the sentinel. Seen on empty or thinking-only responses.
narr_refusal5–50 sModel output — the "Exec is blocked on this session" phrasing. Not in OpenClaw source. Something in the model's input context.Model fabricates a constraint that was never set in the environment and uses it to decline tool-use.
gateway_failure170+ sOpenClaw gatewayGatewayClientRequestError: FallbackSummaryError: All models failed (2)Seen on Llama-4-Maverick. Full model-fallback chain exhausted. Possibly a separate compounding issue; included here because it landed at the same position in our test and also produced zero tool_use.

Fix Action

Fix / Workaround

This is not fixed by today's tool-use merge spree

Our test build (1c91f3f) includes every tool-use fix merged into main on 2026-04-22: #70281 (restore Pi embedded tool allowlist for GPT-5), #70294 (repair malformed tool-call args on openai-completions), #70030 (preserve native Kimi tool_call IDs on Moonshot), #70307 (tool hook parity for Codex), #69551 (avoid vision image tool loops on Codex), #69816 (Ollama vision provider registration), #70210 (honor sessions_spawn ACP model overrides), #70269 (gateway restart acknowledgement defaults). None of them target the cold-start tool_use-emission pathway that our repro exercises.

Code Example

MODELS=(
  "together/MiniMaxAI/MiniMax-M2.7"
  "together/meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8"
  "google/gemini-3.1-pro-preview"
  "xai/grok-4-1-fast"
  "openai/gpt-5.4"
  "groq/openai/gpt-oss-120b"
)
PROMPT="/<skill-route> <tool>"   # a deterministic tool-requiring prompt; also tried NL variants

for M in "${MODELS[@]}"; do
  openclaw models set "$M"
  rm -f ~/.openclaw/sessions/*.jsonl
  openclaw agent --session-id "REPRO-$(date +%s)" --agent main -m "$PROMPT"
done

---

[AGENT] OpenClaw — Full Enterprise Test Cycle
  Running 253 test queries...
  Progress:  50/253 |  46 passed,  4 failed
  Progress: 100/253 |  85 passed, 15 failed
  Progress: 150/253 | 130 passed, 20 failed
  Progress: 200/253 | 173 passed, 27 failed
  Progress: 250/253 | 173 passed, 77 failed
  ✗ RESULT: 173/253 (68%) | 6087.5s | Alert: FAIL
  Avg Latency: 24,061 ms
RAW_BUFFERClick to expand / collapse

[Bug] Agent tool-use regressed for all providers between v2026.4.20 and main — 0/6 models invoke tools on a cold-start prompt; multiple model families converge on fabricated "Exec is blocked on this session" phrase

Summary

Between v2026.4.20 (working) and current origin/main (1c91f3f, broken), OpenClaw's agent has stopped invoking skill tools. The regression is not model-specific, not provider-specific, and not skill-specific — it reproduces across every provider/model we tested on current main, for any skill that exposes shell-based tools emitting structured output blocks.

Cross-provider evidence (primary)

Single cold-start prompt (/<skill-route> <tool>, a deterministic tool-requiring prompt), unique --session-id per test, ~/.openclaw/sessions/ cleared before the run, one model per provider, repeated openclaw models set <model> between tests. All on the same host and skill, fresh build of origin/main at 1c91f3f:

ProviderModelLatencyTool invoked?Response head (verbatim)
togetherMiniMaxAI/MiniMax-M2.721,338 msNo**Full Report** (Apr 22 · 23:14 · from file authority — last fetch 11:28) … — renders cached data from disk, no tool_use emitted
togethermeta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8173,177 msNoGateway agent failed; falling back to embedded: GatewayClientRequestError: FallbackSummaryError: All models failed (2)
googlegemini-3.1-pro-preview14,941 msNoExec is blocked on this session, so I cannot fetch fresh live prices right now. Based on the last successful data pull …
xaigrok-4-1-fast49,514 msNo… (Apr 22 · 11:28 data from file — **exec blocked for fresh fetch**) …
openaigpt-5.423,548 msNoSame data as above. No fresher pull available from this session.
groqopenai/gpt-oss-120b19,904 msNoExec is still blocked on this session. Here's the last <resource> data (11:28 EDT): …

0/6 tool invocations across 5 providers. Zero tool_use blocks emitted. Zero ic_result markers in stdout for any of the 6 responses.

The smoking gun: context injection

Four independent model families (Gemini, Grok, Groq-hosted GPT-OSS-120b, MiniMax) converge on near-identical phrasing: "Exec is blocked on this session" / "exec blocked for fresh fetch" / "Exec is still blocked on this session". Models from unrelated training corpora do not independently invent the same specific string. OpenClaw is injecting "exec is blocked" into the agent's model-facing context — system prompt, tool schema description, or tool-result preamble — and the models are faithfully reading and honoring it.

The literal substring Exec is blocked on this session does not appear in OpenClaw TypeScript source (verified via grep -rn "exec.*blocked" src/ and grep -rn "Exec is blocked" src/ over openclaw/openclaw). The closest source match is src/agents/tool-loop-detection.ts:466/482/520 which emits "Session execution blocked by global circuit breaker …" — but that detector requires an accumulated no-progress tool-call streak to fire and should not activate on a cold session with zero prior tool calls. Either (a) the detector is firing spuriously on cold sessions (possibly reading persisted state from a prior session), (b) "blocked" language lives elsewhere we didn't grep (template files, compiled JSON resources, a runtime-assembled system-prompt builder), or (c) the message is being constructed dynamically by a component whose source signature doesn't match our search.

A cache-trace dump of the exact {system, tools, messages} payload being sent to the provider would answer this — but see "cache-trace not firing" below.

This is not fixed by today's tool-use merge spree

Our test build (1c91f3f) includes every tool-use fix merged into main on 2026-04-22: #70281 (restore Pi embedded tool allowlist for GPT-5), #70294 (repair malformed tool-call args on openai-completions), #70030 (preserve native Kimi tool_call IDs on Moonshot), #70307 (tool hook parity for Codex), #69551 (avoid vision image tool loops on Codex), #69816 (Ollama vision provider registration), #70210 (honor sessions_spawn ACP model overrides), #70269 (gateway restart acknowledgement defaults). None of them target the cold-start tool_use-emission pathway that our repro exercises.


Timeline of evidence

All times in UTC. Same Linux x86 host, same test skill, same fleet-standard provider configuration across the pre- and post-regression data points. The v2026.4.20 working baseline was recorded by an older harness run; we have direct observation of 173/253 passes at that version. Cross-provider breadth testing only happened on current main (we didn't realize the regression was all-provider until we tested it — if the breadth test had been run on 4.20 we'd have broader baseline evidence).

TimestampOpenClaw versionEventResult
2026-04-21 13:18tagv2026.4.20-beta.1 cut
2026-04-21 15:44tagv2026.4.20-beta.2 cut
2026-04-21 18:59tagv2026.4.20 cut
2026-04-21 19:29–23:35v2026.4.20 (inferred from tag-then-deploy timing)agent-mediated 253-query harness run173/253 (68%) pass, avg latency 24,061 ms — real tool invocation observed
2026-04-22 02:29tagv2026.4.21 cut (11 commits past 4.20 — docs, Slack aliases, browser act-path, image-generation defaults, release backports; none touch agent or pi-embedded-runner code)
2026-04-22 16:05rebuildhost rebuilt to dd467830/5 pass, avg latency 965 ms — immediate bail, no provider call made
2026-04-23 02:59rebuildhost rebuilt to 1c91f3f (current origin/main, includes today's merge spree)cross-provider test above: 0/6 across 5 providers, provider calls now happen (5–50 s latencies) but tool_use never emitted

Regression range: v2026.4.20..1c91f3f = 337 commits. The 11 commits in the tagged v2026.4.20..v2026.4.21 window are all docs/backport/browser/Slack/images — none touch agent or pi-embedded-runner code paths. The regression is in the 326 commits merged onto main after v2026.4.21.


Reproduction

Environment

  • OpenClaw: built from origin/main at 1c91f3f (2026-04-23 02:52 UTC). Build recipe: pnpm install && npm run build && npm install -g . in the cloned repo directory, Node v22.22.2.
  • Agent default model: varied per test (see cross-provider table above). Switched via openclaw models set <provider/model> between tests.
  • Skill: any skill that exposes shell-based tools producing structured output blocks. Our test skill exposes ~20 tool subcommands, each of which prints a JSON {"ic_result": {...}} marker on successful execution.
  • Pairing: device paired via normal flow; no metadata-upgrade failures observed in this run.
  • Session state: rm -f ~/.openclaw/sessions/*.jsonl before the run to force cold start. Each test also uses a unique --session-id so no session-cache deferral can occur.

Test invocation

MODELS=(
  "together/MiniMaxAI/MiniMax-M2.7"
  "together/meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8"
  "google/gemini-3.1-pro-preview"
  "xai/grok-4-1-fast"
  "openai/gpt-5.4"
  "groq/openai/gpt-oss-120b"
)
PROMPT="/<skill-route> <tool>"   # a deterministic tool-requiring prompt; also tried NL variants

for M in "${MODELS[@]}"; do
  openclaw models set "$M"
  rm -f ~/.openclaw/sessions/*.jsonl
  openclaw agent --session-id "REPRO-$(date +%s)" --agent main -m "$PROMPT"
done

Additional NL prompt variants tested on one model: "Show my current <resource> — please fetch the latest data", "What is my current <metric>? Run the analysis.", "Fetch a fresh <report-type> for my <scope>", "Run a full <pipeline> end-to-end". All five tested variants landed in the same two failure modes documented below.

Two observed failure modes

ModeLatencySourceDescription
empty_output5–13 sOpenClaw wrapperpi-embedded-runner/run.ts:1832 emits ⚠️ Agent couldn't generate a response. Please try again. when the turn completes with reasoningOnlyRetriesExhausted && !finalAssistantVisibleTextProvider call is made, model streams tokens, but no assistant-visible text and no tool_use block arrives by turn end. OpenClaw's incomplete-turn handler emits the sentinel. Seen on empty or thinking-only responses.
narr_refusal5–50 sModel output — the "Exec is blocked on this session" phrasing. Not in OpenClaw source. Something in the model's input context.Model fabricates a constraint that was never set in the environment and uses it to decline tool-use.
gateway_failure170+ sOpenClaw gatewayGatewayClientRequestError: FallbackSummaryError: All models failed (2)Seen on Llama-4-Maverick. Full model-fallback chain exhausted. Possibly a separate compounding issue; included here because it landed at the same position in our test and also produced zero tool_use.

Reference working state (v2026.4.20)

From the 2026-04-21 19:29–23:35 UTC harness run:

[AGENT] OpenClaw — Full Enterprise Test Cycle
  Running 253 test queries...
  Progress:  50/253 |  46 passed,  4 failed
  Progress: 100/253 |  85 passed, 15 failed
  Progress: 150/253 | 130 passed, 20 failed
  Progress: 200/253 | 173 passed, 27 failed
  Progress: 250/253 | 173 passed, 77 failed
  ✗ RESULT: 173/253 (68%) | 6087.5s | Alert: FAIL
  Avg Latency: 24,061 ms

68% pass rate, 24 s avg latency — consistent with real tool invocation happening at scale. The drop to 0% / ~8 s avg on current main is both the pass-rate and the latency signature of a regression that eliminates tool_use emission entirely.


Isolation: what works

To prove the regression is specifically at the OpenClaw agent → model → tool_use emission layer and not anywhere below it, we verified every component of the stack works when OpenClaw's agent-mediated path is not in the loop:

PathResult
Skill CLI invoked directly on the host (<skill> <cmd>) — no LLM involved✅ Structured output with ic_result marker, <2 s
Skill's internal narrative LLM call (direct Python client → provider API → back, no OpenClaw)✅ Coherent narrative text returned
Skill's internal secondary-LLM call to a local inference endpoint✅ Synthesis returned and signature-verified
Skill's full tool surface across 4 provider configurations, driven by our harness's direct-leg (no agent)91/92 pass (1 unrelated sanity-token skip)
Same skill, same providers, but invoked through openclaw agent0/6 across 5 providers as documented above

Provider APIs, skill implementation, tool-producing scripts, and tool schema are independently verified as healthy. The regression is narrowly at the agent-mediated translation from user prompt → model-facing context → tool_use emission.


What was tried and did not resolve

  • Rebuild to latest main (1c91f3f) including today's tool-use merge spree. Test results above.
  • Clear session state (rm -f ~/.openclaw/sessions/*.jsonl) before each test. No change.
  • Unique --session-id per test. No change.
  • Vary the prompt phrasing from slash-form to explicit "fetch fresh" NL. No change.
  • Vary the model across providers. No change — all 6 models/5 providers fail identically.

Cache-trace not firing

Per src/agents/cache-trace.ts, setting OPENCLAW_CACHE_TRACE=1 + OPENCLAW_CACHE_TRACE_FILE=<path> + OPENCLAW_CACHE_TRACE_SYSTEM=1 + OPENCLAW_CACHE_TRACE_MESSAGES=1 + OPENCLAW_CACHE_TRACE_PROMPT=1 should dump the outbound provider payload. In our runs, the trace file is created (so the path arg is being honored) but remains 0 bytes — no content is ever written, across every tested failure mode and even when the agent is clearly making provider calls (10–50 s latencies). Env vars propagate correctly to the openclaw child process (verified via env | grep OPENCLAW_CACHE both directly and from a spawned subshell), and parseBooleanValue("1") returns true in utils/boolean.ts.

The trace-writer invocation lives in attempt.ts:1025 (createCacheTrace), with the first write at line 1148 (session:loaded). Speculating: something is short-circuiting before session:loaded — maybe an early-return path in run.ts that triggers before attempt.ts reaches that line. If maintainers can advise on the correct enablement (env var, config flag, alternate trace path) to capture the {system, tools, messages} that was sent on a failing call, that would be the single most useful data point toward pinning the "Exec is blocked" injection source.


Related context

Today's merged tool-use fixes (2026-04-22):

  • #70281 — restore Pi embedded tool allowlist for GPT-5 runs
  • #70294 — repair malformed tool-call args on openai-completions
  • #70030 — preserve native Kimi tool_call IDs on Moonshot
  • #70307 — tool hook parity for Codex
  • #69551 — avoid vision image tool loops on Codex
  • #69816 — Ollama vision provider registration
  • #70210 — honor sessions_spawn ACP model overrides
  • #70269 — gateway restart acknowledgement defaults

None target the cold-start tool_use-emission pathway that our repro exercises.

Related open PRs: #70344 (suppress tool-use prelude text), #66147 (split model routing for tool-call turns).


Ask

  1. Acknowledge that agent-mediated tool invocation is regressed between v2026.4.20 and current main for all providers we tested (together, google, xai, openai, groq), given the evidence above.
  2. Point to whichever in-flight PR addresses this, or confirm it's still unscheduled. If this duplicates a tracked bug, link us and we'll close.
  3. Advise on cache-trace enablement — the OPENCLAW_CACHE_TRACE=1 path produced no output in our runs. If there's a way to dump the exact {system, tools, messages} payload being sent on a failing call, we can narrow whether "Exec is blocked" is in the system prompt, the tool schema, a tool-result preamble, or a prior-turn message.

Happy to run additional reproduction: different skills to rule out skill-manifest peculiarities, other fleet-approved provider/model combos, targeted cache-trace if we can get it firing, or a narrower bisect across the 326-commit post-4.21 range on request.


Filed by: [email protected] (personal capacity OSS). Test artifacts preserved on the originating host — available on request.

extent analysis

TL;DR

The most likely fix for the regression in agent-mediated tool invocation between v2026.4.20 and current main is to identify and remove the "Exec is blocked on this session" phrase injection in the OpenClaw agent's model-facing context.

Guidance

  1. Investigate the "Exec is blocked" phrase source: Search the OpenClaw codebase for any potential sources of the "Exec is blocked on this session" phrase, including template files, compiled JSON resources, and runtime-assembled system-prompt builders.
  2. Enable cache-trace: Work with maintainers to determine the correct enablement for cache-trace to capture the {system, tools, messages} payload sent on a failing call, which may help identify the phrase injection source.
  3. Bisect the 326-commit range: Perform a targeted bisect across the 326-commit range post-v2026.4.21 to narrow down the regression introduction.
  4. Verify tool invocation: Test tool invocation with different skills, provider/model combos, and cache-trace to ensure the fix addresses the regression.

Example

No code snippet is provided as the issue requires investigation and debugging rather than a straightforward code fix.

Notes

The regression is likely caused by a change in the OpenClaw agent's model-facing context, which is injecting the "Exec is blocked on this session" phrase. Identifying and removing this injection should fix the issue.

Recommendation

Apply a workaround by investigating and removing the "Exec is blocked on this session" phrase injection, as this is the most likely cause of the regression.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING