openclaw - ✅(Solved) Fix Circuit breaker kill run leaves orphaned tool calls in persistent message history [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#81252Fetched 2026-05-14 03:34:07
View on GitHub
Comments
1
Participants
2
Timeline
6
Reactions
2
Timeline (top)
closed ×1commented ×1cross-referenced ×1mentioned ×1

Error Message

content: [{ type: "text", text: "No result provided" }], isError: true,

Root Cause

The circuit breaker performs a hard abort of the current run without rolling back the message history. The messages table already contains assistant role messages with toolCall content, but no matching tool role messages with toolResult content.

transform-messages.js (part of the pi runtime) then encounters these unpaired tool calls during message preparation and generates the synthetic error as a defensive measure.

Fix Action

Fixed

PR fix notes

PR #81397: fix(agents): repair persisted tool result pairing

Description (problem / solution / changelog)

Summary

  • Problem: interrupted or killed tool runs can leave persisted session JSONL with toolResult entries separated from their assistant tool-call entry, duplicated, or orphaned.
  • Why it matters: session-file repair runs before OpenClaw loads a transcript. If the durable JSONL keeps invalid tool-result ordering, the same session can keep failing on later turns after restart.
  • What changed: the session-file repair pass now moves matching persisted tool results next to their assistant tool call and drops duplicate or orphan persisted tool results before loading the transcript.
  • What did NOT change: runtime/provider replay can still synthesize missing generic tool results when a provider policy needs it. The durable disk repair does not invent missing generic outputs; it only preserves and reorders real persisted entries or removes invalid persisted entries. The existing Codex-specific session-file repair for aborted tool outputs remains unchanged.

Change Type

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Chore/infra

Scope

  • Gateway / orchestration
  • Agents / runtime
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #58608
  • This PR fixes a bug or regression

Real behavior proof

  • Behavior or issue addressed: persisted session JSONL with a displaced matching toolResult, a duplicate toolResult, and an orphan toolResult.
  • Real environment tested: local macOS OpenClaw checkout, current main JSONL session-file repair path, real temp session file, and production repairSessionFileIfNeeded.
  • Exact steps or command run after this patch: ran a local node --import tsx --input-type=module command that wrote a corrupted session.jsonl, invoked repairSessionFileIfNeeded, then read the durable JSONL back from disk.
  • Evidence after fix: console output from that command:
OpenClaw console output: repair result {
  "repaired": true,
  "movedToolResults": 1,
  "droppedDuplicateToolResults": 1,
  "droppedOrphanToolResults": 1,
  "hasBackup": true
}
OpenClaw console output: repaired role order session > user > assistant > toolResult > user
OpenClaw console output: repaired ids proof-session > msg-user > msg-assistant > msg-tool-result > msg-user-followup
OpenClaw console output: preserved moved result parent preserved-parent
  • Observed result after fix: the durable JSONL transcript is rewritten so the real matching tool result sits immediately after the assistant tool call, the duplicate and orphan tool results are removed, the moved entry metadata is preserved, and the original session file is backed up.
  • What was not tested: a live provider call after killing an active tool run, because this patch is isolated to deterministic durable transcript repair.

Root Cause

  • Root cause: transcript replay already knew how to repair invalid tool-call/tool-result order in memory, but session-file repair did not repair persisted tool-result pairing before loading JSONL transcript entries.
  • Missing detection / guardrail: existing disk repair covered malformed JSONL, malformed messages, blank user text, empty assistant error turns, and Codex missing outputs, but did not cover toolResult entries that were displaced, duplicated, or orphaned in the persisted transcript.
  • Contributing context: interruption paths can persist partial tool-call sequences. Once persisted, the invalid entries survive restart and can poison later context rebuilds for the session.

Regression Test Plan

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: src/agents/session-file-repair.test.ts
  • Scenario the test should lock in: session-file repair moves a real matching late tool result next to its assistant tool call, drops duplicate persisted tool results, and drops orphan persisted tool results without synthesizing missing generic results.
  • Why this is the smallest reliable guardrail: it exercises the actual durable JSONL repair boundary and atomic rewrite/backup behavior without requiring provider scheduling or process-kill timing.
  • Existing test that already covers this: in-memory replay repair coverage exists in src/agents/session-transcript-repair.test.ts, but durable session-file repair did not have persisted-entry coverage for this corruption shape.

User-visible / Behavior Changes

Sessions with persisted tool-call/tool-result pairing corruption can recover on restart instead of repeatedly failing during later context rebuilds.

Diagram

Before:
session.jsonl:
assistant(toolCall call_1) -> user follow-up -> toolResult(call_1) -> duplicate/orphan result
load/replay later -> invalid durable transcript can fail again

After:
session-file repair:
assistant(toolCall call_1) -> toolResult(call_1) -> user follow-up
duplicate/orphan persisted results removed

Security Impact

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No

Repro + Verification

Environment

  • OS: macOS
  • Runtime/container: Node 22 repo test wrapper and JSONL session-file repair
  • Model/provider: N/A
  • Integration/channel: N/A
  • Relevant config: temp session JSONL file

Steps

  1. Write a session JSONL file with an assistant toolCall entry.
  2. Persist a user follow-up before the matching toolResult.
  3. Persist a duplicate matching toolResult and an unrelated orphan toolResult.
  4. Run repairSessionFileIfNeeded.
  5. Read the repaired session JSONL back from disk.

Expected

  • The matching toolResult is moved next to the assistant tool call.
  • Duplicate and orphan persisted toolResult entries are removed.
  • Missing generic tool results are not synthesized into durable state.
  • The original session file is backed up.

Actual

  • Before this patch, session-file repair left the invalid tool-result entries in place.
  • After this patch, the persisted JSONL entries are repaired deterministically.

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers

Human Verification

  • Verified scenarios: moved displaced matching tool result, duplicate persisted tool result dropped, orphan persisted tool result dropped, moved entry metadata preserved, backup file written.
  • Edge cases checked: malformed-line repair still works, empty assistant error-turn repair still works, blank user repair still works, delivered trailing assistant messages remain untouched, Codex-specific missing-output repair remains intact, in-memory replay repair coverage still passes.
  • What you did not verify: live provider call after killing an active tool run.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No

Risks and Mitigations

  • Risk: durable repair could accidentally rewrite valid transcript shape.
    • Mitigation: the repair only touches toolResult entries whose ids match a visible assistant tool call span, duplicate ids, or tool results with no matching assistant tool call. Regression tests assert delivered assistant turns and unrelated non-message entries are preserved.
  • Risk: durable repair could invent fake generic tool output.
    • Mitigation: this pass intentionally does not synthesize missing generic tool results; synthetic missing-output repair remains runtime/provider-specific. The existing Codex session-file aborted repair is left as-is.

Validation

  • pnpm docs:list
  • pnpm test src/agents/session-file-repair.test.ts src/agents/session-transcript-repair.test.ts src/agents/pi-embedded-runner.sanitize-session-history.test.ts
  • pnpm exec oxfmt --check --threads=1 CHANGELOG.md docs/reference/transcript-hygiene.md src/agents/session-file-repair.ts src/agents/session-file-repair.test.ts
  • git diff --check
  • pnpm changed:lanes --base upstream/main --json
  • pnpm check:changed --base upstream/main

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • docs/reference/transcript-hygiene.md (modified, +1/-1)
  • src/agents/session-file-repair.test.ts (modified, +176/-0)
  • src/agents/session-file-repair.ts (modified, +180/-13)

Code Example

content: [{ type: "text", text: "No result provided" }],
isError: true,

---

SELECT message_id, role, seq, substr(content, 1, 100)
FROM messages
WHERE conversation_id = 81
  AND content LIKE '%No result provided%';
RAW_BUFFERClick to expand / collapse

Design Issue: Orphaned tool calls persist after circuit breaker kill run

Environment

  • OpenClaw version: 2026.5.7
  • Runtime: pi
  • OS: macOS

Problem Description

When Loop Detection's globalCircuitBreakerThreshold triggers a kill run, the current message run is terminated. However, incomplete tool calls (toolCall without toolResult) are left in the persistent message history (lcm.db).

On every subsequent request in the same conversation, transform-messages.js detects these orphaned tool calls and inserts a synthetic error:

content: [{ type: "text", text: "No result provided" }],
isError: true,

This creates a permanent pollution cycle:

  1. Circuit breaker kills a run that has emitted toolCall messages
  2. The corresponding toolResult messages are never generated
  3. The orphaned toolCalls remain in lcm.db
  4. transform-messages.js synthesizes "No result provided" on every future turn
  5. The model sees this error in its history and may repeat tool calls or report failures
  6. User interprets this as "tools are broken for this model"

Evidence from message history (lcm.db)

SELECT message_id, role, seq, substr(content, 1, 100)
FROM messages
WHERE conversation_id = 81
  AND content LIKE '%No result provided%';

Returns dozens of assistant messages across hundreds of turns, all containing the synthetic error.

Even after:

  • Disabling Loop Detection ("enabled": false)
  • Restarting gateway
  • Changing model provider

…the orphaned tool calls and synthetic errors remain in the persistent history.

Root cause analysis

The circuit breaker performs a hard abort of the current run without rolling back the message history. The messages table already contains assistant role messages with toolCall content, but no matching tool role messages with toolResult content.

transform-messages.js (part of the pi runtime) then encounters these unpaired tool calls during message preparation and generates the synthetic error as a defensive measure.

Expected behavior

When a run is killed by circuit breaker (or any hard abort), the message history should be restored to a consistent state — either:

  1. Transactionally remove the incomplete assistant messages that contain tool calls without results, OR
  2. Mark the orphaned tool calls as cancelled/aborted with a distinct status, rather than the generic and misleading "No result provided", OR
  3. Provide an operator command to "sanitize" a conversation's message history by removing or repairing orphaned tool call pairs.

Steps to reproduce

  1. Enable Loop Detection with aggressive thresholds (e.g., warningThreshold: 4, criticalThreshold: 5, globalCircuitBreakerThreshold: 6)
  2. Use a model prone to tool/thinking loops (e.g., accounts/fireworks/routers/kimi-k2p6-turbo)
  3. Trigger the circuit breaker (model repeats tool calls 6+ times)
  4. Disable Loop Detection and restart gateway
  5. Continue the same conversation — observe "No result provided" persists across turns

Additional context

  • This was initially misdiagnosed as a Fireworks/Kimi model-specific bug because the synthetic error only becomes visible when using models that generate frequent tool calls.
  • The actual tool execution layer (exec, read, write) works correctly when tested in isolation; the failure is caused by the polluted history feeding back into the model.
  • External subprocesses (e.g., user-launched gbrain dream via Bun) can compound the issue by creating resource contention that makes new tool executions fail, adding new orphaned tool calls on top of the historical ones.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

When a run is killed by circuit breaker (or any hard abort), the message history should be restored to a consistent state — either:

  1. Transactionally remove the incomplete assistant messages that contain tool calls without results, OR
  2. Mark the orphaned tool calls as cancelled/aborted with a distinct status, rather than the generic and misleading "No result provided", OR
  3. Provide an operator command to "sanitize" a conversation's message history by removing or repairing orphaned tool call pairs.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix Circuit breaker kill run leaves orphaned tool calls in persistent message history [1 pull requests, 1 comments, 2 participants]