openclaw - 💡(How to fix) Fix Feature request: make tool-call failure recovery a native OpenClaw run-loop capability

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

OpenClaw should support a native, configurable "tool failure recovery step" inside the agent run loop.

When a tool call fails, OpenClaw should preserve the structured failure result and give the same agent one bounded continuation step with that failure in context, so the agent can decide whether to retry, repair arguments, inspect state, compensate for partial mutation, or report the failure clearly.

This is different from blind retry. The requested behavior is "reflect, then continue": let the LLM make the recovery decision with complete tool failure context, while OpenClaw enforces attempt limits, loop prevention, and transcript semantics.

Error Message

// src/agents/pi-embedded-subscribe.handlers.tools.ts // origin/main c38a9a883a, lines 1373-1402 // Run after_tool_call plugin hook (fire-and-forget) const hookRunnerAfter = ctx.hookRunner ?? (await loadHookRunnerGlobal()).getGlobalHookRunner(); if (hookRunnerAfter?.hasHooks("after_tool_call")) { const hookEvent: PluginHookAfterToolCallEvent = { toolName, params: afterToolCallArgs, runId, toolCallId, result: sanitizedResult, error: isToolError ? extractToolErrorMessage(sanitizedResult) : undefined, durationMs, }; void hookRunnerAfter .runAfterToolCall(hookEvent, { ... }) .catch((err) => { ctx.log.warn(after_tool_call hook failed: tool=${toolName} error=${String(err)}); }); }

Root Cause

The current plugin answer is useful, but it is not enough for a reliable product behavior. A plugin can wake the agent after a failure; native OpenClaw can keep the failure inside the active reasoning loop, before finalization, with consistent coverage and bounded retry ownership.

That is the gap this issue asks to close.

Fix Action

Fix / Workaround

  • It cannot splice a continuation step into the original still-running tool loop.
  • It cannot prevent the original turn from finalizing with stale or misleading success text before the recovery step happens.
  • It cannot guarantee all native/Codex/embedded tool paths are covered.
  • It depends on hook dispatch coverage, and there are still open hook-coverage bugs.
  • It can race with session ownership, channel delivery, heartbeat coalescing, cron lanes, and other wake producers.
  • It must reconstruct retry budgets, failure fingerprints, and idempotency context outside the core run state.
  • It cannot reliably distinguish "mutating tool definitely failed before side effects" from "mutating tool may have partially succeeded" unless core supplies standardized attempt and result metadata.
  • For Codex-native records, current docs explicitly say OpenClaw observes/mirrors selected events but cannot rewrite the native Codex thread unless the native/app-server layer supports that operation.

This supports the plugin workaround, but the workaround wakes a later turn instead of continuing the active run loop.

This is not a criticism of the hook API; it is a reason not to make core recovery depend exclusively on plugin hook dispatch. #76201 is a concrete report where plugin hook coverage did not match expectations for native exec.

Code Example

// src/agents/pi-embedded-subscribe.handlers.tools.ts
// origin/main c38a9a883a, lines 1373-1402
// Run after_tool_call plugin hook (fire-and-forget)
const hookRunnerAfter = ctx.hookRunner ?? (await loadHookRunnerGlobal()).getGlobalHookRunner();
if (hookRunnerAfter?.hasHooks("after_tool_call")) {
  const hookEvent: PluginHookAfterToolCallEvent = {
    toolName,
    params: afterToolCallArgs,
    runId,
    toolCallId,
    result: sanitizedResult,
    error: isToolError ? extractToolErrorMessage(sanitizedResult) : undefined,
    durationMs,
  };
  void hookRunnerAfter
    .runAfterToolCall(hookEvent, { ... })
    .catch((err) => {
      ctx.log.warn(`after_tool_call hook failed: tool=${toolName} error=${String(err)}`);
    });
}

---

// src/plugins/hook-types.ts
// origin/main c38a9a883a, lines 500-508
export type PluginHookAfterToolCallEvent = {
  toolName: string;
  params: Record<string, unknown>;
  runId?: string;
  toolCallId?: string;
  result?: unknown;
  error?: string;
  durationMs?: number;
};

---

// src/agents/bash-tools.exec.ts
// origin/main c38a9a883a, lines 70-84
function buildExecForegroundResult(params: {
  outcome: ExecProcessOutcome;
  cwd?: string;
  warningText?: string;
}): AgentToolResult<ExecToolDetails> {
  const warningText = params.warningText?.trim() ? `${params.warningText}\n\n` : "";
  if (params.outcome.status === "failed") {
    return failedTextResult(`${warningText}${params.outcome.reason}`, {
      status: "failed",
      exitCode: params.outcome.exitCode ?? null,
      durationMs: params.outcome.durationMs,
      aggregated: params.outcome.aggregated,
      timedOut: params.outcome.timedOut,
      cwd: params.cwd,
    });
  }

---

// src/plugins/runtime/types-core.ts
// origin/main c38a9a883a, lines 222-229
system: {
  enqueueSystemEvent: typeof import("../../infra/system-events.js").enqueueSystemEvent;
  requestHeartbeat: typeof import("../../infra/heartbeat-wake.js").requestHeartbeat;
  /**
   * @deprecated Use `requestHeartbeat({ source, intent, reason })` so wake producers declare
   * scheduler intent explicitly.
   */
  requestHeartbeatNow: (opts?: RuntimeRequestHeartbeatNowOptions) => void;

---

// src/agents/pi-tools.before-tool-call.ts
// origin/main c38a9a883a, lines 684-689
const hookRunner = getGlobalHookRunner();
try {
  const hasBeforeToolCallHooks = hookRunner?.hasHooks("before_tool_call") === true;
  const shouldRunTrustedPolicies = hasTrustedToolPolicies();
  if (!shouldRunTrustedPolicies && !hasBeforeToolCallHooks) {
    return { blocked: false, params };
  }

---

{
  "agents": {
    "defaults": {
      "toolFailureRecovery": {
        "enabled": true,
        "mode": "native-continuation",
        "maxAttemptsPerToolCall": 1,
        "maxAttemptsPerRun": 3,
        "recoverBeforeFinalAnswer": true,
        "finalAnswerPolicyOnUnrecoveredFailure": "prefer_failure_summary",
        "includeTools": ["exec", "read", "write", "edit", "browser", "web_fetch"],
        "excludeTools": ["message", "tts"],
        "excludeExitCodes": [126, 127, 130],
        "failureFingerprintTtl": "10m",
        "mutatingToolPolicy": "agent_judgment_with_warning"
      }
    }
  }
}

---

{
  "agents": {
    "list": [
      {
        "id": "main",
        "toolFailureRecovery": {
          "enabled": true,
          "maxAttemptsPerRun": 2
        }
      },
      {
        "id": "high-risk-ops",
        "toolFailureRecovery": {
          "enabled": true,
          "mutatingToolPolicy": "require_state_inspection_first",
          "excludeTools": ["message", "email", "github_create_issue", "github_add_comment"]
        }
      }
    ]
  }
}

---

{
  "cron": {
    "failureAlert": {
      "mode": "agent-turn",
      "after": 1
    },
    "toolFailureRecovery": {
      "enabled": true,
      "maxAttemptsPerRun": 1,
      "finalAnswerPolicyOnUnrecoveredFailure": "fail_closed"
    }
  }
}

---

{
  "type": "openclaw.tool_failure_recovery",
  "runId": "run_123",
  "attempt": 1,
  "maxAttemptsPerRun": 3,
  "failedTool": {
    "toolName": "exec",
    "toolCallId": "call_abc",
    "params": {
      "cmd": "node <<'NODE'\n...\nNODE"
    },
    "result": {
      "status": "failed",
      "exitCode": 1,
      "timedOut": false,
      "durationMs": 84,
      "aggregated": "SyntaxError: Unexpected token ..."
    },
    "error": "Command exited with code 1"
  },
  "instructions": [
    "Do not blindly repeat the same tool call.",
    "Use the failure details to decide whether to repair arguments, inspect state, compensate, or report inability.",
    "For mutating tools, consider whether the prior call may have partially succeeded before retrying.",
    "If you cannot recover within the budget, say exactly what failed and what remains undone."
  ]
}

---

node scripts/run-vitest.mjs \
  src/agents/pi-embedded-subscribe.handlers.tools*.test.ts \
  src/agents/openclaw-owned-tool-runtime-contract.test.ts \
  src/agents/pi-tools.before-tool-call*.test.ts
RAW_BUFFERClick to expand / collapse

Summary

OpenClaw should support a native, configurable "tool failure recovery step" inside the agent run loop.

When a tool call fails, OpenClaw should preserve the structured failure result and give the same agent one bounded continuation step with that failure in context, so the agent can decide whether to retry, repair arguments, inspect state, compensate for partial mutation, or report the failure clearly.

This is different from blind retry. The requested behavior is "reflect, then continue": let the LLM make the recovery decision with complete tool failure context, while OpenClaw enforces attempt limits, loop prevention, and transcript semantics.

Problem

Tool failures can currently surface as user-visible warnings or detached follow-up events even when the active agent could probably repair the action immediately if it saw the structured failure reason at the right time.

This shows up in real operator workflows:

  • a tool call fails because of a malformed inline script, bad args, missing command, transient timeout, stale ref, or provider/runtime mismatch;
  • the human sees a warning or tool failure notification;
  • the agent's final answer may still say the work is done, or the agent never gets a chance to correct the failed call;
  • the human has to manually relay the failure back to the agent.

For a system whose core job is autonomous tool use, that is the wrong recovery boundary. The tool failure should remain inside the agent control loop whenever possible.

Why this should be native, not plugin-only

OpenClaw already has enough plugin primitives to build a useful proof of concept:

  • after_tool_call can observe tool results and errors.
  • plugin runtime exposes enqueueSystemEvent.
  • plugin runtime exposes requestHeartbeat / requestHeartbeatNow.
  • structured exec results include details like exitCode, timedOut, and aggregated.

That makes a plugin-based V1 possible: observe the failed tool call, enqueue a system event into the same session, wake the agent, and ask it to decide what to do.

However, that is still a follow-up wake, not true run-loop recovery. A plugin cannot confidently provide the fully correct behavior because it does not own the active model/tool loop.

Plugin limitations:

  • It cannot splice a continuation step into the original still-running tool loop.
  • It cannot prevent the original turn from finalizing with stale or misleading success text before the recovery step happens.
  • It cannot guarantee all native/Codex/embedded tool paths are covered.
  • It depends on hook dispatch coverage, and there are still open hook-coverage bugs.
  • It can race with session ownership, channel delivery, heartbeat coalescing, cron lanes, and other wake producers.
  • It must reconstruct retry budgets, failure fingerprints, and idempotency context outside the core run state.
  • It cannot reliably distinguish "mutating tool definitely failed before side effects" from "mutating tool may have partially succeeded" unless core supplies standardized attempt and result metadata.
  • For Codex-native records, current docs explicitly say OpenClaw observes/mirrors selected events but cannot rewrite the native Codex thread unless the native/app-server layer supports that operation.

The native implementation should live where OpenClaw already decides how tool results are returned to the model, how turns continue, how retry loops are bounded, and how final answers are selected.

Current plugin-first answer seen in related work

The closest previous issue, #7297, was closed as implemented because the hook infrastructure now lets plugins observe failed tool calls and enqueue follow-up turns:

  • #7297: "Feature: Wire up after_tool_call hook + exec auto-retry on failure"

That closure is reasonable for the original ask's preferred plugin path, but it does not close the deeper product gap. The implemented hook path makes external recovery possible; it does not make failure recovery part of the active run loop.

There is also aligned work for cron failures:

  • #56545: "feat(cron): add agent-turn mode for failure alerts"

That PR has the same philosophy: route failure information into an agent turn rather than only notifying a human. This request generalizes that idea to tool failures while the agent turn is still recoverable.

Related issues and PRs

  • #7297 closed: plugin hook + exec auto-retry discussion. This is the closest prior tracker, but it stopped at "plugins can implement a follow-up turn."

  • #56545 open PR: cron failureAlert.mode: "agent-turn". Same recovery philosophy for cron job failures.

  • #13364 open: expose before_tool_call / after_tool_call in managed internal hooks. Shows that hook ergonomics and hook coverage remain active product/API work.

  • #76201 open: plugin before_tool_call hook does not fire for native exec on some harness paths. This is direct evidence that a plugin-only answer is not high-confidence enough for core recovery behavior.

  • #14024 closed PR: structured tool reflection for error recovery. Useful prior art for classification, repeated-failure tracking, and diagnostic hints.

  • #73781 closed: runtime tool-call replay loop after failure. Important negative prior art: recovery must be agent-judged and bounded, not runtime blind replay.

  • #84798 merged: disabled nested Pi SDK auto-retry to prevent tool-call replay loops. This reinforces that retry ownership should be explicit and native to OpenClaw, not accidental nested runtime behavior.

  • #75923 open: run-scoped cross-tool consecutive error cascade detection. Adjacent loop prevention surface.

  • #49876 open: cron sessions can deliver hallucinated output instead of failing cleanly after tool failures. This is adjacent to why failed tool state needs structured delivery/finalization policy, not just human-visible warnings.

Source evidence from current main

Reviewed against origin/main at c38a9a883a.

Plugin hooks can observe, but after_tool_call is fire-and-forget

handleToolExecutionEnd emits the tool result output, then invokes after_tool_call as fire-and-forget:

// src/agents/pi-embedded-subscribe.handlers.tools.ts
// origin/main c38a9a883a, lines 1373-1402
// Run after_tool_call plugin hook (fire-and-forget)
const hookRunnerAfter = ctx.hookRunner ?? (await loadHookRunnerGlobal()).getGlobalHookRunner();
if (hookRunnerAfter?.hasHooks("after_tool_call")) {
  const hookEvent: PluginHookAfterToolCallEvent = {
    toolName,
    params: afterToolCallArgs,
    runId,
    toolCallId,
    result: sanitizedResult,
    error: isToolError ? extractToolErrorMessage(sanitizedResult) : undefined,
    durationMs,
  };
  void hookRunnerAfter
    .runAfterToolCall(hookEvent, { ... })
    .catch((err) => {
      ctx.log.warn(`after_tool_call hook failed: tool=${toolName} error=${String(err)}`);
    });
}

GitHub anchor: https://github.com/openclaw/openclaw/blob/c38a9a883a/src/agents/pi-embedded-subscribe.handlers.tools.ts#L1373-L1402

This is good for observation and plugin-triggered follow-up, but it is not a synchronous "recover before finalization" mechanism.

Hook event data has structured result/error fields

PluginHookAfterToolCallEvent already carries structured result, error, duration, run, and tool-call identity:

// src/plugins/hook-types.ts
// origin/main c38a9a883a, lines 500-508
export type PluginHookAfterToolCallEvent = {
  toolName: string;
  params: Record<string, unknown>;
  runId?: string;
  toolCallId?: string;
  result?: unknown;
  error?: string;
  durationMs?: number;
};

GitHub anchor: https://github.com/openclaw/openclaw/blob/c38a9a883a/src/plugins/hook-types.ts#L500-L508

This is enough information to design the native recovery event shape without inventing everything from scratch.

exec already returns useful structured failure metadata

Foreground exec failures return details.exitCode, durationMs, aggregated, timedOut, and cwd:

// src/agents/bash-tools.exec.ts
// origin/main c38a9a883a, lines 70-84
function buildExecForegroundResult(params: {
  outcome: ExecProcessOutcome;
  cwd?: string;
  warningText?: string;
}): AgentToolResult<ExecToolDetails> {
  const warningText = params.warningText?.trim() ? `${params.warningText}\n\n` : "";
  if (params.outcome.status === "failed") {
    return failedTextResult(`${warningText}${params.outcome.reason}`, {
      status: "failed",
      exitCode: params.outcome.exitCode ?? null,
      durationMs: params.outcome.durationMs,
      aggregated: params.outcome.aggregated,
      timedOut: params.outcome.timedOut,
      cwd: params.cwd,
    });
  }

GitHub anchor: https://github.com/openclaw/openclaw/blob/c38a9a883a/src/agents/bash-tools.exec.ts#L70-L84

The recovery step should receive this metadata directly, not as a lossy human-visible warning.

Plugin runtime exposes the follow-up-turn primitives

The plugin runtime exposes enqueueSystemEvent and heartbeat wake APIs:

// src/plugins/runtime/types-core.ts
// origin/main c38a9a883a, lines 222-229
system: {
  enqueueSystemEvent: typeof import("../../infra/system-events.js").enqueueSystemEvent;
  requestHeartbeat: typeof import("../../infra/heartbeat-wake.js").requestHeartbeat;
  /**
   * @deprecated Use `requestHeartbeat({ source, intent, reason })` so wake producers declare
   * scheduler intent explicitly.
   */
  requestHeartbeatNow: (opts?: RuntimeRequestHeartbeatNowOptions) => void;

GitHub anchor: https://github.com/openclaw/openclaw/blob/c38a9a883a/src/plugins/runtime/types-core.ts#L222-L229

Docs show the same pattern: https://github.com/openclaw/openclaw/blob/c38a9a883a/docs/plugins/sdk-runtime.md#L459-L470

This supports the plugin workaround, but the workaround wakes a later turn instead of continuing the active run loop.

Codex runtime docs explicitly describe native-thread boundaries

The Codex runtime docs say async native after_tool_call observations are for telemetry/plugin compatibility and "cannot block, delay, or mutate the native tool call":

https://github.com/openclaw/openclaw/blob/c38a9a883a/docs/plugins/codex-harness-runtime.md#L100-L103

They also say Codex owns canonical native thread history and OpenClaw should not mutate unsupported internals:

https://github.com/openclaw/openclaw/blob/c38a9a883a/docs/plugins/codex-harness-runtime.md#L135-L139

That boundary matters here. A plugin cannot make recovery fully reliable across native runtime surfaces unless the runtime provides a first-class recovery continuation API.

There is known hook coverage risk

runBeforeToolCallHook still gates pre-tool policy on hook runner availability:

// src/agents/pi-tools.before-tool-call.ts
// origin/main c38a9a883a, lines 684-689
const hookRunner = getGlobalHookRunner();
try {
  const hasBeforeToolCallHooks = hookRunner?.hasHooks("before_tool_call") === true;
  const shouldRunTrustedPolicies = hasTrustedToolPolicies();
  if (!shouldRunTrustedPolicies && !hasBeforeToolCallHooks) {
    return { blocked: false, params };
  }

GitHub anchor: https://github.com/openclaw/openclaw/blob/c38a9a883a/src/agents/pi-tools.before-tool-call.ts#L684-L689

This is not a criticism of the hook API; it is a reason not to make core recovery depend exclusively on plugin hook dispatch. #76201 is a concrete report where plugin hook coverage did not match expectations for native exec.

Proposed behavior

Add a native recovery policy that runs after a failed tool result is available but before the agent turn is considered done.

High-level flow:

  1. Tool call starts.
  2. Tool call returns a structured failure result.
  3. OpenClaw records the failure with runId, toolCallId, toolName, args, result details, error text, duration, and mutability/idempotency hints where available.
  4. OpenClaw checks a recovery policy:
    • enabled?
    • tool included/excluded?
    • failure category retryable?
    • max attempts exceeded?
    • same failure fingerprint already recovered?
    • run/session still owns the turn?
  5. If recovery is allowed, OpenClaw injects a compact system/developer-context item into the active model loop:
    • "The previous tool call failed."
    • structured fields for the failed call.
    • explicit instruction: inspect state if needed; do not blindly repeat mutating actions; either retry with repaired args, compensate, or report inability.
  6. The same agent gets one bounded continuation step.
  7. If the continuation produces another failing tool call, repeat only until configured attempt limits are reached.
  8. If recovery is exhausted, OpenClaw finalizes with a clear failure state and avoids claiming success from stale assistant text.

Suggested config

Global defaults:

{
  "agents": {
    "defaults": {
      "toolFailureRecovery": {
        "enabled": true,
        "mode": "native-continuation",
        "maxAttemptsPerToolCall": 1,
        "maxAttemptsPerRun": 3,
        "recoverBeforeFinalAnswer": true,
        "finalAnswerPolicyOnUnrecoveredFailure": "prefer_failure_summary",
        "includeTools": ["exec", "read", "write", "edit", "browser", "web_fetch"],
        "excludeTools": ["message", "tts"],
        "excludeExitCodes": [126, 127, 130],
        "failureFingerprintTtl": "10m",
        "mutatingToolPolicy": "agent_judgment_with_warning"
      }
    }
  }
}

Per-agent override:

{
  "agents": {
    "list": [
      {
        "id": "main",
        "toolFailureRecovery": {
          "enabled": true,
          "maxAttemptsPerRun": 2
        }
      },
      {
        "id": "high-risk-ops",
        "toolFailureRecovery": {
          "enabled": true,
          "mutatingToolPolicy": "require_state_inspection_first",
          "excludeTools": ["message", "email", "github_create_issue", "github_add_comment"]
        }
      }
    ]
  }
}

Cron override:

{
  "cron": {
    "failureAlert": {
      "mode": "agent-turn",
      "after": 1
    },
    "toolFailureRecovery": {
      "enabled": true,
      "maxAttemptsPerRun": 1,
      "finalAnswerPolicyOnUnrecoveredFailure": "fail_closed"
    }
  }
}

Possible enum semantics:

  • mode: "off": no recovery.
  • mode: "plugin-followup": compatibility mode matching what a plugin can do today.
  • mode: "native-continuation": preferred native same-run recovery.
  • mutatingToolPolicy: "agent_judgment_with_warning": agent may retry, but recovery prompt must warn about partial success.
  • mutatingToolPolicy: "require_state_inspection_first": agent must inspect state before retrying or compensating.
  • mutatingToolPolicy: "never_auto_retry": agent can only summarize/report after mutating failures.
  • finalAnswerPolicyOnUnrecoveredFailure: "prefer_failure_summary": suppress stale success-looking output and prefer a concise failure summary.
  • finalAnswerPolicyOnUnrecoveredFailure: "fail_closed": unattended surfaces can deliver nothing or a structured failure, never a success-looking answer.

Recovery prompt shape

The recovery context should be structured, not just a prose warning. Example:

{
  "type": "openclaw.tool_failure_recovery",
  "runId": "run_123",
  "attempt": 1,
  "maxAttemptsPerRun": 3,
  "failedTool": {
    "toolName": "exec",
    "toolCallId": "call_abc",
    "params": {
      "cmd": "node <<'NODE'\n...\nNODE"
    },
    "result": {
      "status": "failed",
      "exitCode": 1,
      "timedOut": false,
      "durationMs": 84,
      "aggregated": "SyntaxError: Unexpected token ..."
    },
    "error": "Command exited with code 1"
  },
  "instructions": [
    "Do not blindly repeat the same tool call.",
    "Use the failure details to decide whether to repair arguments, inspect state, compensate, or report inability.",
    "For mutating tools, consider whether the prior call may have partially succeeded before retrying.",
    "If you cannot recover within the budget, say exactly what failed and what remains undone."
  ]
}

Implementation sketch

This is intentionally not prescribing the final architecture, but a plausible PR path:

  1. Add config types and schema for toolFailureRecovery.

    • likely files: src/config/types.*, src/config/zod-schema.ts, src/config/schema.help.ts, docs under docs/gateway/configuration-reference.md.
  2. Add a small recovery policy module.

    • classify tool result failure status;
    • fingerprint failures by run/tool/name/normalized params/error category;
    • enforce max attempts;
    • decide whether to recover, fail closed, or continue normally.
  3. Integrate into embedded Pi tool completion path.

    • likely near handleToolExecutionEnd(...) after sanitizedResult and isToolError are known;
    • native recovery should occur before final answer selection/finalization, not as only fire-and-forget plugin work.
  4. Integrate into Codex/native harness paths where OpenClaw can request another model step.

    • If a native runtime cannot support same-turn continuation yet, represent that honestly as a runtime capability check and fall back to plugin-style follow-up mode.
  5. Preserve plugin hooks.

    • after_tool_call should still fire.
    • Plugins should be able to observe whether native recovery was attempted/exhausted, but core should own the recovery loop.
  6. Add transcript/final-answer guardrails.

    • If a failed tool remains unrecovered, stale "Done" / success-looking final text should not win over the failure state.
    • This overlaps with #49876, but this issue can start narrower: when native recovery is enabled and exhausted, finalization must not hide the unrecovered failed tool.
  7. Add docs.

    • Explain difference between blind retry, plugin follow-up, and native continuation.
    • Explain mutating tool behavior and why agent judgment is used with bounded recovery budgets.

Tests / acceptance criteria

Suggested focused tests:

  • A failed non-mutating exec result triggers one native continuation before final answer delivery.
  • The continuation receives structured failure data including toolName, toolCallId, params, error, and details.exitCode.
  • The agent can repair args and succeed on the second call.
  • Same failure fingerprint is not retried indefinitely.
  • maxAttemptsPerToolCall and maxAttemptsPerRun are enforced.
  • Mutating-tool failure recovery includes explicit partial-success warning context.
  • excludeTools prevents recovery for delivery tools such as message.
  • Unrecovered failure suppresses or replaces stale success-looking final text according to config.
  • after_tool_call plugin hooks still run exactly once per real tool completion.
  • Plugin hook failures do not disable native recovery.
  • Codex/native runtime paths either support native continuation or explicitly fall back with a documented capability flag.
  • Cron/unattended mode can choose fail_closed on unrecovered failure.

Likely useful test surfaces from adjacent issues:

node scripts/run-vitest.mjs \
  src/agents/pi-embedded-subscribe.handlers.tools*.test.ts \
  src/agents/openclaw-owned-tool-runtime-contract.test.ts \
  src/agents/pi-tools.before-tool-call*.test.ts

Additional tests should be added for the new recovery policy module and final-answer behavior.

Non-goals

  • Do not reintroduce blind SDK/runtime auto-replay of failed tool calls. #73781 and #84798 are the cautionary prior art.
  • Do not make every failure retryable.
  • Do not force mutating tools to auto-retry without agent-visible partial-success context.
  • Do not replace plugin hooks; keep them useful for policies, telemetry, and optional follow-up behavior.
  • Do not require managed/internal hooks (#13364) before core recovery can work.

Why this matters

The current plugin answer is useful, but it is not enough for a reliable product behavior. A plugin can wake the agent after a failure; native OpenClaw can keep the failure inside the active reasoning loop, before finalization, with consistent coverage and bounded retry ownership.

That is the gap this issue asks to close.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Feature request: make tool-call failure recovery a native OpenClaw run-loop capability