openclaw - 💡(How to fix) Fix [Feature]: Bayesian upstream retry escalation — re-run the previous step when the same tool fails repeatedly [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#62377Fetched 2026-04-08 03:05:15
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Author
Participants
Timeline (top)
labeled ×1

In multi-step agentic flows, the same tool can fail repeatedly because a previous step produced bad output — not because the failing step itself is broken. OpenClaw has no way to detect this: it retries the failing step with the same input, rotates auth profiles, or gives up. The result is silent waste: retries that can never succeed, no signal about where the real fault is, and an abort that forces the user to restart manually. This feature would detect the pattern — identical failure signature across N retries — and escalate responsibility upstream by instructing the agent to regenerate the prior step's output before trying again.

Error Message

In a multi-step agentic flow — for example: generate_image receiving output from build_prompt which received output from analyze_context — repeated identical failures at step N are Bayesian evidence that the input from step N-1 is at fault. Two retries of the same step with the same input and the same error is essentially a proof that retrying is pointless without regenerating upstream output. After K identical failures at a tool call (same tool, same error class), instead of retrying again, the agent should be instructed to re-run the preceding step and produce fresh input before attempting the failing tool again. When it occurs, the chain produces no useful output. The user receives either a vague error or a degraded result with no indication that the root cause was two steps back. There is no graceful degradation — it is a hard stop. A cron-triggered skill runs nightly: it fetches the day's top stories, the agent summarizes them into a digest, then a subagent posts the digest to a Slack channel. The summarization step produces output that consistently exceeds Slack's message size limit. The posting subagent fails every time with the same explicit message_too_long API error — same tool, same error class, two retries, no change.

Root Cause

In multi-step agentic flows, the same tool can fail repeatedly because a previous step produced bad output — not because the failing step itself is broken. OpenClaw has no way to detect this: it retries the failing step with the same input, rotates auth profiles, or gives up. The result is silent waste: retries that can never succeed, no signal about where the real fault is, and an abort that forces the user to restart manually. This feature would detect the pattern — identical failure signature across N retries — and escalate responsibility upstream by instructing the agent to regenerate the prior step's output before trying again.

Code Example

// pseudo-config
agentLoop: {
  upstreamRetry: {
    enabled: true,
    threshold: 2,        // identical failures before escalating
    lookback: 1          // how many steps back to re-run
  }
}
RAW_BUFFERClick to expand / collapse

Summary

In multi-step agentic flows, the same tool can fail repeatedly because a previous step produced bad output — not because the failing step itself is broken. OpenClaw has no way to detect this: it retries the failing step with the same input, rotates auth profiles, or gives up. The result is silent waste: retries that can never succeed, no signal about where the real fault is, and an abort that forces the user to restart manually. This feature would detect the pattern — identical failure signature across N retries — and escalate responsibility upstream by instructing the agent to regenerate the prior step's output before trying again.

Problem to solve

When a tool call fails and retries are exhausted, OpenClaw currently either rotates auth profiles or falls back to a different model. It does not consider that the failure may be caused by bad input from a previous step rather than a problem with the failing step itself.

In a multi-step agentic flow — for example: generate_image receiving output from build_prompt which received output from analyze_context — repeated identical failures at step N are Bayesian evidence that the input from step N-1 is at fault. Two retries of the same step with the same input and the same error is essentially a proof that retrying is pointless without regenerating upstream output.

Today there is no mechanism to act on this. The agent loop either keeps retrying the failing step or aborts entirely.

Proposed solution

After K identical failures at a tool call (same tool, same error class), instead of retrying again, the agent should be instructed to re-run the preceding step and produce fresh input before attempting the failing tool again.

A simple, non-breaking way to implement this: in the after_tool_call plugin hook, track per-session failure signatures. When the threshold is crossed, inject a directive via before_prompt_build on the next turn:

// pseudo-config
agentLoop: {
  upstreamRetry: {
    enabled: true,
    threshold: 2,        // identical failures before escalating
    lookback: 1          // how many steps back to re-run
  }
}

This does not require changes to the core retry policy (which is correctly scoped to per-request HTTP retries). It operates at the agent loop level and can be opt-in initially.

Alternatives considered

Considered implementing this purely via prompt engineering (instructing the agent in the system prompt to self-diagnose upstream causes), but this is unreliable and adds token overhead on every run regardless of whether failures occur. A hook-based approach only activates on the failure path.

Impact

Affected users and systems

  • Anyone running multi-step tool chains via the agent loop — image generation pipelines, multi-agent sessions_send flows, cron-driven automations, and any skill that chains more than 2–3 tool calls sequentially.
  • Disproportionately affects users with external dependencies in the chain (image generation APIs, web tools, third-party SaaS via plugins) where transient vs. deterministic failure is genuinely ambiguous.

Severity

blocks workflow When it occurs, the chain produces no useful output. The user receives either a vague error or a degraded result with no indication that the root cause was two steps back. There is no graceful degradation — it is a hard stop.

Frequency

intermittent — scales with chain length For short 2–3 step chains: rare. For chains of 6+ steps with external service calls: near-certain over time. The probability of hitting this failure mode grows multiplicatively with chain length, so power users building longer automations will encounter it regularly.

Consequences

  • Wasted retries and API quota consumed on a step that cannot succeed with its current input.
  • No diagnostic signal: the user sees a failure at step N with no indication that step N-1 is the actual cause.
  • Manual restart required — the user must re-trigger the entire flow from scratch, losing any intermediate work already completed.
  • Undermines confidence in longer automations, pushing users toward shorter, manually supervised chains instead of the autonomous workflows OpenClaw is designed for.

Evidence/examples

A cron-triggered skill runs nightly: it fetches the day's top stories, the agent summarizes them into a digest, then a subagent posts the digest to a Slack channel. The summarization step produces output that consistently exceeds Slack's message size limit. The posting subagent fails every time with the same explicit message_too_long API error — same tool, same error class, two retries, no change. No amount of retrying the posting step with the same input can succeed. The fault is entirely in the prior step producing oversized output. If the upstream step — the summarization — is re-run with an instruction to produce a shorter output, the posting succeeds on the first attempt.

Additional information

Related concepts: Spectrum-Based Fault Localization (SBFL), the Saga pattern from distributed systems, and Model-Based Diagnosis (Reiter/de Kleer). The core idea — that repeated retries of step N shift Bayesian suspicion toward step N-1 — is well established in reliability engineering but has no equivalent in current AI agent orchestration frameworks.

Related OpenClaw surfaces: after_tool_call and before_prompt_build plugin hooks are the natural implementation points. No core loop changes required for an initial opt-in implementation.

extent analysis

TL;DR

Implement a mechanism to detect repeated identical failures in multi-step agentic flows and escalate responsibility upstream by re-running the preceding step to produce fresh input.

Guidance

  • Identify the threshold for identical failures (e.g., 2 retries) and implement a counter to track these failures.
  • Use the after_tool_call plugin hook to track per-session failure signatures and detect when the threshold is crossed.
  • Inject a directive via the before_prompt_build hook to re-run the preceding step and produce fresh input when the threshold is crossed.
  • Consider implementing this feature as an opt-in configuration option, allowing users to enable or disable it as needed.

Example

// pseudo-config
agentLoop: {
  upstreamRetry: {
    enabled: true,
    threshold: 2,        // identical failures before escalating
    lookback:

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING