openclaw - 💡(How to fix) Fix [Bug] Agent tool-use regressed for all providers between v2026.4.20 and main — 0/6 models invoke tools on a cold-start prompt; multiple model families converge on fabricated "Exec is blocked on this session" phrase [6 comments, 3 participants]

openclaw2026-04-23 03:23:43

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#70456•Fetched 2026-04-24 05:57:51

View on GitHub

Comments

Participants

Timeline

Reactions

Author

perlowja

Participants

openclaw-barnacle[bot]

perlowja

steipete

Timeline (top)

commented ×6mentioned ×4subscribed ×4closed ×1

Between v2026.4.20 (working) and current origin/main (1c91f3f, broken), OpenClaw's agent has stopped invoking skill tools. The regression is not model-specific, not provider-specific, and not skill-specific — it reproduces across every provider/model we tested on current main, for any skill that exposes shell-based tools emitting structured output blocks.

Root Cause

Mode	Latency	Source	Description
`empty_output`	5–13 s	OpenClaw wrapper — `pi-embedded-runner/run.ts:1832` emits `⚠️ Agent couldn't generate a response. Please try again.` when the turn completes with `reasoningOnlyRetriesExhausted && !finalAssistantVisibleText`	Provider call is made, model streams tokens, but no assistant-visible text and no `tool_use` block arrives by turn end. OpenClaw's incomplete-turn handler emits the sentinel. Seen on empty or thinking-only responses.
`narr_refusal`	5–50 s	Model output — the "Exec is blocked on this session" phrasing. Not in OpenClaw source. Something in the model's input context.	Model fabricates a constraint that was never set in the environment and uses it to decline tool-use.
`gateway_failure`	170+ s	OpenClaw gateway — `GatewayClientRequestError: FallbackSummaryError: All models failed (2)`	Seen on Llama-4-Maverick. Full model-fallback chain exhausted. Possibly a separate compounding issue; included here because it landed at the same position in our test and also produced zero `tool_use`.

Fix Action

Fix / Workaround

This is not fixed by today's tool-use merge spree

Our test build (1c91f3f) includes every tool-use fix merged into main on 2026-04-22: #70281 (restore Pi embedded tool allowlist for GPT-5), #70294 (repair malformed tool-call args on openai-completions), #70030 (preserve native Kimi tool_call IDs on Moonshot), #70307 (tool hook parity for Codex), #69551 (avoid vision image tool loops on Codex), #69816 (Ollama vision provider registration), #70210 (honor sessions_spawn ACP model overrides), #70269 (gateway restart acknowledgement defaults). None of them target the cold-start tool_use-emission pathway that our repro exercises.

Code Example

MODELS=(
  "together/MiniMaxAI/MiniMax-M2.7"
  "together/meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8"
  "google/gemini-3.1-pro-preview"
  "xai/grok-4-1-fast"
  "openai/gpt-5.4"
  "groq/openai/gpt-oss-120b"
)
PROMPT="/<skill-route> <tool>"   # a deterministic tool-requiring prompt; also tried NL variants

for M in "${MODELS[@]}"; do
  openclaw models set "$M"
  rm -f ~/.openclaw/sessions/*.jsonl
  openclaw agent --session-id "REPRO-$(date +%s)" --agent main -m "$PROMPT"
done

---

[AGENT] OpenClaw — Full Enterprise Test Cycle
  Running 253 test queries...
  Progress:  50/253 |  46 passed,  4 failed
  Progress: 100/253 |  85 passed, 15 failed
  Progress: 150/253 | 130 passed, 20 failed
  Progress: 200/253 | 173 passed, 27 failed
  Progress: 250/253 | 173 passed, 77 failed
  ✗ RESULT: 173/253 (68%) | 6087.5s | Alert: FAIL
  Avg Latency: 24,061 ms

RAW_BUFFERClick to expand / collapse

[Bug] Agent tool-use regressed for all providers between v2026.4.20 and main — 0/6 models invoke tools on a cold-start prompt; multiple model families converge on fabricated "Exec is blocked on this session" phrase

Summary

Cross-provider evidence (primary)

Single cold-start prompt (/<skill-route> <tool>, a deterministic tool-requiring prompt), unique --session-id per test, ~/.openclaw/sessions/ cleared before the run, one model per provider, repeated openclaw models set <model> between tests. All on the same host and skill, fresh build of origin/main at 1c91f3f:

Provider	Model	Latency	Tool invoked?	Response head (verbatim)
together	`MiniMaxAI/MiniMax-M2.7`	21,338 ms	No	`Full Report (Apr 22 · 23:14 · from file authority — last fetch 11:28) …` — renders cached data from disk, no `tool_use` emitted
together	`meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8`	173,177 ms	No	`Gateway agent failed; falling back to embedded: GatewayClientRequestError: FallbackSummaryError: All models failed (2)`
google	`gemini-3.1-pro-preview`	14,941 ms	No	`Exec is blocked on this session, so I cannot fetch fresh live prices right now. Based on the last successful data pull …`
xai	`grok-4-1-fast`	49,514 ms	No	`… (Apr 22 · 11:28 data from file — exec blocked for fresh fetch) …`
openai	`gpt-5.4`	23,548 ms	No	`Same data as above. No fresher pull available from this session.`
groq	`openai/gpt-oss-120b`	19,904 ms	No	`Exec is still blocked on this session. Here's the last <resource> data (11:28 EDT): …`

0/6 tool invocations across 5 providers. Zero tool_use blocks emitted. Zero ic_result markers in stdout for any of the 6 responses.

The smoking gun: context injection

Four independent model families (Gemini, Grok, Groq-hosted GPT-OSS-120b, MiniMax) converge on near-identical phrasing: "Exec is blocked on this session" / "exec blocked for fresh fetch" / "Exec is still blocked on this session". Models from unrelated training corpora do not independently invent the same specific string. OpenClaw is injecting "exec is blocked" into the agent's model-facing context — system prompt, tool schema description, or tool-result preamble — and the models are faithfully reading and honoring it.

The literal substring Exec is blocked on this session does not appear in OpenClaw TypeScript source (verified via grep -rn "exec.*blocked" src/ and grep -rn "Exec is blocked" src/ over openclaw/openclaw). The closest source match is src/agents/tool-loop-detection.ts:466/482/520 which emits "Session execution blocked by global circuit breaker …" — but that detector requires an accumulated no-progress tool-call streak to fire and should not activate on a cold session with zero prior tool calls. Either (a) the detector is firing spuriously on cold sessions (possibly reading persisted state from a prior session), (b) "blocked" language lives elsewhere we didn't grep (template files, compiled JSON resources, a runtime-assembled system-prompt builder), or (c) the message is being constructed dynamically by a component whose source signature doesn't match our search.

A cache-trace dump of the exact {system, tools, messages} payload being sent to the provider would answer this — but see "cache-trace not firing" below.

This is not fixed by today's tool-use merge spree

Timeline of evidence

All times in UTC. Same Linux x86 host, same test skill, same fleet-standard provider configuration across the pre- and post-regression data points. The v2026.4.20 working baseline was recorded by an older harness run; we have direct observation of 173/253 passes at that version. Cross-provider breadth testing only happened on current main (we didn't realize the regression was all-provider until we tested it — if the breadth test had been run on 4.20 we'd have broader baseline evidence).

Timestamp	OpenClaw version	Event	Result
2026-04-21 13:18	tag	`v2026.4.20-beta.1` cut	—
2026-04-21 15:44	tag	`v2026.4.20-beta.2` cut	—
2026-04-21 18:59	tag	`v2026.4.20` cut	—
2026-04-21 19:29–23:35	v2026.4.20 (inferred from tag-then-deploy timing)	agent-mediated 253-query harness run	173/253 (68%) pass, avg latency 24,061 ms — real tool invocation observed
2026-04-22 02:29	tag	`v2026.4.21` cut (11 commits past 4.20 — docs, Slack aliases, browser act-path, image-generation defaults, release backports; none touch agent or pi-embedded-runner code)	—
2026-04-22 16:05	rebuild	host rebuilt to `dd46783`	0/5 pass, avg latency 965 ms — immediate bail, no provider call made
2026-04-23 02:59	rebuild	host rebuilt to `1c91f3f` (current `origin/main`, includes today's merge spree)	cross-provider test above: 0/6 across 5 providers, provider calls now happen (5–50 s latencies) but `tool_use` never emitted

Regression range: v2026.4.20..1c91f3f = 337 commits. The 11 commits in the tagged v2026.4.20..v2026.4.21 window are all docs/backport/browser/Slack/images — none touch agent or pi-embedded-runner code paths. The regression is in the 326 commits merged onto main after v2026.4.21.

Reproduction

Environment

OpenClaw: built from origin/main at 1c91f3f (2026-04-23 02:52 UTC). Build recipe: pnpm install && npm run build && npm install -g . in the cloned repo directory, Node v22.22.2.
Agent default model: varied per test (see cross-provider table above). Switched via openclaw models set <provider/model> between tests.
Skill: any skill that exposes shell-based tools producing structured output blocks. Our test skill exposes ~20 tool subcommands, each of which prints a JSON {"ic_result": {...}} marker on successful execution.
Pairing: device paired via normal flow; no metadata-upgrade failures observed in this run.
Session state: rm -f ~/.openclaw/sessions/*.jsonl before the run to force cold start. Each test also uses a unique --session-id so no session-cache deferral can occur.

Test invocation

MODELS=(
  "together/MiniMaxAI/MiniMax-M2.7"
  "together/meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8"
  "google/gemini-3.1-pro-preview"
  "xai/grok-4-1-fast"
  "openai/gpt-5.4"
  "groq/openai/gpt-oss-120b"
)
PROMPT="/<skill-route> <tool>"   # a deterministic tool-requiring prompt; also tried NL variants

for M in "${MODELS[@]}"; do
  openclaw models set "$M"
  rm -f ~/.openclaw/sessions/*.jsonl
  openclaw agent --session-id "REPRO-$(date +%s)" --agent main -m "$PROMPT"
done

Additional NL prompt variants tested on one model: "Show my current <resource> — please fetch the latest data", "What is my current <metric>? Run the analysis.", "Fetch a fresh <report-type> for my <scope>", "Run a full <pipeline> end-to-end". All five tested variants landed in the same two failure modes documented below.

Two observed failure modes

Mode	Latency	Source	Description
`empty_output`	5–13 s	OpenClaw wrapper — `pi-embedded-runner/run.ts:1832` emits `⚠️ Agent couldn't generate a response. Please try again.` when the turn completes with `reasoningOnlyRetriesExhausted && !finalAssistantVisibleText`	Provider call is made, model streams tokens, but no assistant-visible text and no `tool_use` block arrives by turn end. OpenClaw's incomplete-turn handler emits the sentinel. Seen on empty or thinking-only responses.
`narr_refusal`	5–50 s	Model output — the "Exec is blocked on this session" phrasing. Not in OpenClaw source. Something in the model's input context.	Model fabricates a constraint that was never set in the environment and uses it to decline tool-use.
`gateway_failure`	170+ s	OpenClaw gateway — `GatewayClientRequestError: FallbackSummaryError: All models failed (2)`	Seen on Llama-4-Maverick. Full model-fallback chain exhausted. Possibly a separate compounding issue; included here because it landed at the same position in our test and also produced zero `tool_use`.

Reference working state (v2026.4.20)

From the 2026-04-21 19:29–23:35 UTC harness run:

[AGENT] OpenClaw — Full Enterprise Test Cycle
  Running 253 test queries...
  Progress:  50/253 |  46 passed,  4 failed
  Progress: 100/253 |  85 passed, 15 failed
  Progress: 150/253 | 130 passed, 20 failed
  Progress: 200/253 | 173 passed, 27 failed
  Progress: 250/253 | 173 passed, 77 failed
  ✗ RESULT: 173/253 (68%) | 6087.5s | Alert: FAIL
  Avg Latency: 24,061 ms

68% pass rate, 24 s avg latency — consistent with real tool invocation happening at scale. The drop to 0% / ~8 s avg on current main is both the pass-rate and the latency signature of a regression that eliminates tool_use emission entirely.

Isolation: what works

To prove the regression is specifically at the OpenClaw agent → model → tool_use emission layer and not anywhere below it, we verified every component of the stack works when OpenClaw's agent-mediated path is not in the loop:

Path	Result
Skill CLI invoked directly on the host (`<skill> <cmd>`) — no LLM involved	✅ Structured output with `ic_result` marker, <2 s
Skill's internal narrative LLM call (direct Python client → provider API → back, no OpenClaw)	✅ Coherent narrative text returned
Skill's internal secondary-LLM call to a local inference endpoint	✅ Synthesis returned and signature-verified
Skill's full tool surface across 4 provider configurations, driven by our harness's direct-leg (no agent)	✅ 91/92 pass (1 unrelated sanity-token skip)
Same skill, same providers, but invoked through `openclaw agent`	❌ 0/6 across 5 providers as documented above

Provider APIs, skill implementation, tool-producing scripts, and tool schema are independently verified as healthy. The regression is narrowly at the agent-mediated translation from user prompt → model-facing context → tool_use emission.

What was tried and did not resolve

Rebuild to latest main (1c91f3f) including today's tool-use merge spree. Test results above.
Clear session state (rm -f ~/.openclaw/sessions/*.jsonl) before each test. No change.
Unique --session-id per test. No change.
Vary the prompt phrasing from slash-form to explicit "fetch fresh" NL. No change.
Vary the model across providers. No change — all 6 models/5 providers fail identically.

Cache-trace not firing

Per src/agents/cache-trace.ts, setting OPENCLAW_CACHE_TRACE=1 + OPENCLAW_CACHE_TRACE_FILE=<path> + OPENCLAW_CACHE_TRACE_SYSTEM=1 + OPENCLAW_CACHE_TRACE_MESSAGES=1 + OPENCLAW_CACHE_TRACE_PROMPT=1 should dump the outbound provider payload. In our runs, the trace file is created (so the path arg is being honored) but remains 0 bytes — no content is ever written, across every tested failure mode and even when the agent is clearly making provider calls (10–50 s latencies). Env vars propagate correctly to the openclaw child process (verified via env | grep OPENCLAW_CACHE both directly and from a spawned subshell), and parseBooleanValue("1") returns true in utils/boolean.ts.

The trace-writer invocation lives in attempt.ts:1025 (createCacheTrace), with the first write at line 1148 (session:loaded). Speculating: something is short-circuiting before session:loaded — maybe an early-return path in run.ts that triggers before attempt.ts reaches that line. If maintainers can advise on the correct enablement (env var, config flag, alternate trace path) to capture the {system, tools, messages} that was sent on a failing call, that would be the single most useful data point toward pinning the "Exec is blocked" injection source.

Related context

Today's merged tool-use fixes (2026-04-22):

#70281 — restore Pi embedded tool allowlist for GPT-5 runs
#70294 — repair malformed tool-call args on openai-completions
#70030 — preserve native Kimi tool_call IDs on Moonshot
#70307 — tool hook parity for Codex
#69551 — avoid vision image tool loops on Codex
#69816 — Ollama vision provider registration
#70210 — honor sessions_spawn ACP model overrides
#70269 — gateway restart acknowledgement defaults

None target the cold-start tool_use-emission pathway that our repro exercises.

Related open PRs: #70344 (suppress tool-use prelude text), #66147 (split model routing for tool-call turns).

Ask

Acknowledge that agent-mediated tool invocation is regressed between v2026.4.20 and current main for all providers we tested (together, google, xai, openai, groq), given the evidence above.
Point to whichever in-flight PR addresses this, or confirm it's still unscheduled. If this duplicates a tracked bug, link us and we'll close.
Advise on cache-trace enablement — the OPENCLAW_CACHE_TRACE=1 path produced no output in our runs. If there's a way to dump the exact {system, tools, messages} payload being sent on a failing call, we can narrow whether "Exec is blocked" is in the system prompt, the tool schema, a tool-result preamble, or a prior-turn message.

Happy to run additional reproduction: different skills to rule out skill-manifest peculiarities, other fleet-approved provider/model combos, targeted cache-trace if we can get it firing, or a narrower bisect across the 326-commit post-4.21 range on request.

Filed by: [email protected] (personal capacity OSS). Test artifacts preserved on the originating host — available on request.

extent analysis

TL;DR

The most likely fix for the regression in agent-mediated tool invocation between v2026.4.20 and current main is to identify and remove the "Exec is blocked on this session" phrase injection in the OpenClaw agent's model-facing context.

Guidance

Investigate the "Exec is blocked" phrase source: Search the OpenClaw codebase for any potential sources of the "Exec is blocked on this session" phrase, including template files, compiled JSON resources, and runtime-assembled system-prompt builders.
Enable cache-trace: Work with maintainers to determine the correct enablement for cache-trace to capture the {system, tools, messages} payload sent on a failing call, which may help identify the phrase injection source.
Bisect the 326-commit range: Perform a targeted bisect across the 326-commit range post-v2026.4.21 to narrow down the regression introduction.
Verify tool invocation: Test tool invocation with different skills, provider/model combos, and cache-trace to ensure the fix addresses the regression.

Example

No code snippet is provided as the issue requires investigation and debugging rather than a straightforward code fix.

Notes

The regression is likely caused by a change in the OpenClaw agent's model-facing context, which is injecting the "Exec is blocked on this session" phrase. Identifying and removing this injection should fix the issue.

Recommendation

Apply a workaround by investigating and removing the "Exec is blocked on this session" phrase injection, as this is the most likely cause of the regression.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #model download #tokenizer error #prompt formatting #chain error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - 💡(How to fix) Fix [Bug] Agent tool-use regressed for all providers between v2026.4.20 and main — 0/6 models invoke tools on a cold-start prompt; multiple model families converge on fabricated "Exec is blocked on this session" phrase [6 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

This is not fixed by today's tool-use merge spree

Code Example

[Bug] Agent tool-use regressed for all providers between v2026.4.20 and main — 0/6 models invoke tools on a cold-start prompt; multiple model families converge on fabricated "Exec is blocked on this session" phrase

Summary

Cross-provider evidence (primary)

The smoking gun: context injection

This is not fixed by today's tool-use merge spree

Timeline of evidence

Reproduction

Environment

Test invocation

Two observed failure modes

Reference working state (v2026.4.20)

Isolation: what works

What was tried and did not resolve

Cache-trace not firing

Related context

Ask

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING