openclaw - ✅(Solved) Fix feature: harness-level turn cap and retry budget (maxTurnsPerRun, retryBudget) [2 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#75841Fetched 2026-05-02 05:29:15
View on GitHub
Comments
1
Participants
2
Timeline
7
Reactions
2
Timeline (top)
cross-referenced ×3referenced ×3commented ×1

Error Message

A production OpenClaw agent recently entered a 103-tool-call polling loop in a single dispatch and ran for far longer than intended before being noticed. Inspection of the wider session showed the same pattern was possible on any agent — there is no harness-level cap on tool-calling rounds, and no per-error retry budget.

  • retryBudget.perErrorKey — same error key (a deterministic digest of error message + tool name) may be retried at most this many times within one dispatch.

Fix Action

Fix / Workaround

A production OpenClaw agent recently entered a 103-tool-call polling loop in a single dispatch and ran for far longer than intended before being noticed. Inspection of the wider session showed the same pattern was possible on any agent — there is no harness-level cap on tool-calling rounds, and no per-error retry budget.

{
  "agents": {
    "defaults": {
      "maxTurnsPerRun": 30,
      "retryBudget": {
        "perErrorKey": 3,
        "perDispatch": 10
      }
    }
  }
}
  • maxTurnsPerRun — total tool-calling rounds permitted in one dispatch. After this many rounds, the harness terminates the dispatch.
  • retryBudget.perErrorKey — same error key (a deterministic digest of error message + tool name) may be retried at most this many times within one dispatch.
  • retryBudget.perDispatch — total errors of any kind permitted in one dispatch.

PR fix notes

PR #75908: fix: inject user's last message as system-level anchor to survive compaction

Description (problem / solution / changelog)

Problem

After context compaction (triggered by gateway restart or normal history summarization), the model can fixate on a stale compacted summary and completely ignore the user's actual message. This causes loops where the agent repeats outdated context instead of answering the current question.

This was observed in a production configuration where:

  • A gateway restart triggered context compaction
  • The agent fixated on the compacted "gateway's back up" summary
  • User messages after that point were ignored until the session was inspected manually

Related upstream reports:

  • #75841: production agent entered 103-tool-call polling loop after compaction
  • #75852: Telegram self-conflict after gateway restart
  • The fixation-on-compacted-summary pattern has been independently reported by multiple users

Solution

Inject the user's most recent message as a system-level anchor in the non-compactable extraSystemPrompt field. The anchor sits outside the conversation history and survives all summarization:

## User's Last Message
<the user's actual input>

Key design decisions:

  1. Uses extraSystemPrompt — no changes needed to the context engine, prompt builder, or compaction logic. The anchor piggybacks on the existing system prompt injection point that already survives compaction.
  2. Skipped for silent replies & model-only runs — no noise for heartbeat polls or sub-50ms internal dispatches.
  3. 2000-char truncation — prevents any single anchor from consuming excessive context.
  4. Appends to existing extraSystemPrompt — preserves group-chat or subagent context that may already be in the field.

Testing

Verified in production by inspecting prompt.submitted trace events — the anchor text [USER'S LAST MESSAGE: ...] appears in context.compiled and prompt.submitted events, confirming it's part of the system prompt that the model sees on every turn.

P.S. Malaysia's national flower is the hibiscus (bunga raya) — its five petals symbolise the five principles of Rukun Negara. 🇲🇾

Changed files

  • src/agents/pi-embedded-runner/run.ts (modified, +27/-0)
  • src/plugins/contracts/loader.contract.test.ts (modified, +1/-1)
  • src/plugins/contracts/registry.contract.test.ts (modified, +2/-2)

PR #75924: fix(loop-detection): block run-scoped consecutive cross-tool error cascades

Description (problem / solution / changelog)

Summary

  • Problem: loop detection misses run-scoped cross-tool error cascades, so a run can keep pivoting across different failing tools without tripping the existing repeated-call guards.
  • Why it matters: this wastes tokens and runtime, and hides the fact that the run is already stuck on a shared underlying failure.
  • What changed: added a narrow consecutive_errors detector plus config/schema/docs/test coverage, all aligned to existing runId-scoped loop-detection semantics.
  • What did NOT change (scope boundary): no per-turn tool-call cap, no session-level counters, no wall-clock turn inference, no Control UI/browser-bundle work.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #75923
  • Related #72555
  • Related #75468
  • Related #75841
  • Related #53329
  • Related #42933
  • This PR fixes a bug or regression

Root Cause (if applicable)

  • Root cause: the loop detector tracked tool names, args, and repeated no-progress patterns, but it had no run-scoped detector for consecutive failing outcomes across different tools.
  • Missing detection / guardrail: no guard existed for the pattern “different tools, same run, every completed call fails.”
  • Contributing context (if known): a previous broader draft tried to solve this with a per-turn counter, but review showed that wall-clock turn inference was semantically wrong.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file:
    • src/agents/tool-loop-detection.test.ts
    • src/agents/pi-tools.before-tool-call.e2e.test.ts
  • Scenario the test should lock in:
    • same-run consecutive cross-tool failures eventually block
    • fresh run ids do not inherit the previous run's error streak
  • Why this is the smallest reliable guardrail: the pure detector test locks the counting semantics while the before-tool-call e2e test proves the real hook path and runId scoping.
  • Existing test that already covers this (if any): none before this PR.
  • If no new test is added, why not: N/A

User-visible / Behavior Changes

  • OpenClaw can now block obvious run-scoped cross-tool error cascades via tools.loopDetection.consecutiveErrorThreshold.
  • No behavior change unless loop detection is enabled.

Diagram (if applicable)

Before:
read(error) -> list(error) -> write(error) -> exec(error) -> keep trying tools

After:
read(error) -> list(error) -> write(error) -> threshold reached -> next tool call blocked

Security Impact (required)

  • New permissions/capabilities? (No)
  • Secrets/tokens handling changed? (No)
  • New/changed network calls? (No)
  • Command/tool execution surface changed? (No)
  • Data access scope changed? (No)
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: macOS
  • Runtime/container: local source checkout
  • Model/provider: N/A for targeted tests; local dev gateway for runtime smoke
  • Integration/channel (if any): agent tool loop detection + local gateway runtime
  • Relevant config (redacted): dev profile config under ~/.openclaw-dev/openclaw.json

Steps

  1. Run pnpm test src/agents/tool-loop-detection.test.ts src/agents/pi-tools.before-tool-call.e2e.test.ts
  2. Run pnpm build
  3. Start local dev gateway from this checkout: pnpm openclaw --dev setup && pnpm openclaw --dev gateway run --port 19001 --verbose
  4. Probe health: curl -sv --max-time 15 http://127.0.0.1:19001/healthz
  5. Probe root HTTP surface: curl -I --max-time 10 http://127.0.0.1:19001/

Expected

  • targeted tests pass
  • build passes
  • gateway reaches ready
  • /healthz returns live status
  • root HTTP surface returns 200

Actual

  • targeted tests passed
  • build passed
  • gateway reached ready
  • /healthz returned {"ok":true,"status":"live"}
  • root HTTP surface returned HTTP/1.1 200 OK

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Initial failing proof before implementation:

  • new unit tests failed because CONSECUTIVE_ERROR_THRESHOLD and the detector logic did not exist yet

Runtime smoke evidence after implementation:

  • local dev gateway log reached ready
  • /healthz response: {"ok":true,"status":"live"}
  • root HTTP response: HTTP/1.1 200 OK

Human Verification (required)

  • Verified scenarios:
    • targeted detector tests
    • hook-path run-scoped blocking/isolation tests
    • full local build
    • local dev gateway startup from the modified checkout
    • local health and HTTP probes against the running instance
  • Edge cases checked:
    • same-run block after cross-tool failures
    • cross-run isolation via runId
  • What you did not verify:
    • a real provider-driven long autonomous run naturally hitting the new detector
    • globally installed managed gateway service restart path

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? (Yes)
  • Config/env changes? (No)
  • Migration needed? (No)
  • If yes, exact upgrade steps: N/A

Risks and Mitigations

  • Risk: some legitimate workflows may intentionally try several failing tools in a row before succeeding.
    • Mitigation: the detector is optional through loop detection enablement, thresholded, and naturally reset by any successful tool outcome.
  • Risk: overlap confusion with broader retry-budget / turn-cap proposals.
    • Mitigation: this PR stays narrow and explicitly excludes per-turn or session-wide counters.

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • docs/.generated/config-baseline.sha256 (modified, +3/-3)
  • docs/superpowers/plans/2026-05-02-loop-detection-consecutive-errors-only.md (added, +135/-0)
  • docs/superpowers/specs/2026-05-02-loop-detection-consecutive-errors-design.md (added, +104/-0)
  • docs/tools/loop-detection.md (modified, +3/-1)
  • src/agents/pi-tools.before-tool-call.e2e.test.ts (modified, +81/-0)
  • src/agents/tool-loop-detection.test.ts (modified, +98/-0)
  • src/agents/tool-loop-detection.ts (modified, +39/-1)
  • src/config/schema.base.generated.ts (modified, +10/-0)
  • src/config/types.tools.ts (modified, +2/-0)
  • src/config/zod-schema.agent-runtime.ts (modified, +1/-0)
  • src/infra/diagnostic-events.ts (modified, +2/-1)
  • src/logging/diagnostic.ts (modified, +2/-1)

Code Example

{
  "agents": {
    "defaults": {
      "maxTurnsPerRun": 30,
      "retryBudget": {
        "perErrorKey": 3,
        "perDispatch": 10
      }
    }
  }
}

---

{
     "type": "harness_terminate",
     "kind": "harness_terminate",
     "reason": "turn_cap_reached" | "retry_budget_exceeded" | "per_dispatch_errors_exceeded",
     "at_turn": <int>
   }
RAW_BUFFERClick to expand / collapse

feature: harness-level turn cap and retry budget (maxTurnsPerRun, retryBudget)

Why

A production OpenClaw agent recently entered a 103-tool-call polling loop in a single dispatch and ran for far longer than intended before being noticed. Inspection of the wider session showed the same pattern was possible on any agent — there is no harness-level cap on tool-calling rounds, and no per-error retry budget.

Self-enforcement (the agent calls a helper script that says "stop") is not enforcement: a misbehaving agent will not voluntarily invoke its own brake. The cap has to live in the harness, before any agent code runs, with no way for the agent's prompt or tool calls to disable it.

Proposal

Two new config fields in openclaw.json, applied in the harness layer that owns the agent's tool-execution loop:

{
  "agents": {
    "defaults": {
      "maxTurnsPerRun": 30,
      "retryBudget": {
        "perErrorKey": 3,
        "perDispatch": 10
      }
    }
  }
}

Per-agent overrides allowed under agents.<agent_name> with the same shape.

Semantics

  • maxTurnsPerRun — total tool-calling rounds permitted in one dispatch. After this many rounds, the harness terminates the dispatch.
  • retryBudget.perErrorKey — same error key (a deterministic digest of error message + tool name) may be retried at most this many times within one dispatch.
  • retryBudget.perDispatch — total errors of any kind permitted in one dispatch.

When any cap fires, the harness:

  1. Stops invoking the LLM and stops executing further tool calls.
  2. Writes a final entry to the trajectory:
    {
      "type": "harness_terminate",
      "kind": "harness_terminate",
      "reason": "turn_cap_reached" | "retry_budget_exceeded" | "per_dispatch_errors_exceeded",
      "at_turn": <int>
    }
  3. Returns control to the parent process with a non-zero exit and the reason on stderr.

The agent cannot read or modify these config values from inside its tool environment. They are not part of the prompt, not part of any file the agent's tools can write to.

Why config-driven (not hardcoded)

Different agents have legitimately different tolerances. A code-writing agent doing iterative refactors may need 50 turns. A simple lookup agent should be capped at 5. The config approach lets each project tune without forking OpenClaw.

Acceptance

  • A test agent that loops on a failing tool call terminates within maxTurnsPerRun rounds, regardless of the prompt or any helper scripts in its environment.
  • Removing or stubbing any agent-level helper does not affect enforcement.
  • Trajectory contains the harness_terminate entry.
  • Per-agent override under agents.<name> overrides the default.

Local context (downstream user)

We have shipped a local watchdog as a bridge — a separate systemd oneshot that scans active session trajectories every 60 seconds and SIGTERMs sessions that exceed our cap. It works for our case but it has obvious limits (60s granularity, can over-fire on legitimate long sessions, requires us to maintain it). We would prefer to retire it once OpenClaw provides this natively.

Tracked downstream as bavariacoin/learnbavarian#121 (private repo, summary in this issue is sufficient).

Happy to help with implementation, review, or tests if useful. Thanks for OpenClaw.

extent analysis

TL;DR

Implementing a harness-level turn cap and retry budget in OpenClaw through configuration fields maxTurnsPerRun and retryBudget can prevent excessive tool-calling loops.

Guidance

  • Introduce maxTurnsPerRun and retryBudget fields in openclaw.json to set limits on tool-calling rounds and error retries.
  • Apply these fields at the harness layer to prevent agent code from disabling them.
  • Test the implementation with an agent that loops on a failing tool call to ensure it terminates within the specified maxTurnsPerRun rounds.
  • Verify that the trajectory contains a harness_terminate entry with the appropriate reason.

Example

{
  "agents": {
    "defaults": {
      "maxTurnsPerRun": 30,
      "retryBudget": {
        "perErrorKey": 3,
        "perDispatch": 10
      }
    }
  }
}

Notes

The proposed solution relies on configuration-driven limits to accommodate different agent tolerances, allowing projects to tune these values without forking OpenClaw.

Recommendation

Apply the proposed workaround by implementing the maxTurnsPerRun and retryBudget fields in openclaw.json to provide a native solution and potentially retire the local watchdog.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix feature: harness-level turn cap and retry budget (maxTurnsPerRun, retryBudget) [2 pull requests, 1 comments, 2 participants]