openclaw - ✅(Solved) Fix feature: harness-level turn cap and retry budget (maxTurnsPerRun, retryBudget) [2 pull requests, 1 comments, 2 participants]

openclaw2026-05-01 22:57:40

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#75841•Fetched 2026-05-02 05:29:15

View on GitHub

Comments

Participants

Timeline

Reactions

Author

bavariacoin

Participants

bavariacoin

clawsweeper[bot]

Timeline (top)

cross-referenced ×3referenced ×3commented ×1

Error Message

A production OpenClaw agent recently entered a 103-tool-call polling loop in a single dispatch and ran for far longer than intended before being noticed. Inspection of the wider session showed the same pattern was possible on any agent — there is no harness-level cap on tool-calling rounds, and no per-error retry budget.

retryBudget.perErrorKey — same error key (a deterministic digest of error message + tool name) may be retried at most this many times within one dispatch.

Fix Action

Fix / Workaround

{
  "agents": {
    "defaults": {
      "maxTurnsPerRun": 30,
      "retryBudget": {
        "perErrorKey": 3,
        "perDispatch": 10
      }
    }
  }
}

maxTurnsPerRun — total tool-calling rounds permitted in one dispatch. After this many rounds, the harness terminates the dispatch.
retryBudget.perErrorKey — same error key (a deterministic digest of error message + tool name) may be retried at most this many times within one dispatch.
retryBudget.perDispatch — total errors of any kind permitted in one dispatch.

PR fix notes

PR #75908: fix: inject user's last message as system-level anchor to survive compaction

Repository: openclaw/openclaw
Author: baghvn
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/75908

Description (problem / solution / changelog)

Problem

After context compaction (triggered by gateway restart or normal history summarization), the model can fixate on a stale compacted summary and completely ignore the user's actual message. This causes loops where the agent repeats outdated context instead of answering the current question.

This was observed in a production configuration where:

A gateway restart triggered context compaction
The agent fixated on the compacted "gateway's back up" summary
User messages after that point were ignored until the session was inspected manually

Related upstream reports:

#75841: production agent entered 103-tool-call polling loop after compaction
#75852: Telegram self-conflict after gateway restart
The fixation-on-compacted-summary pattern has been independently reported by multiple users

Solution

Inject the user's most recent message as a system-level anchor in the non-compactable extraSystemPrompt field. The anchor sits outside the conversation history and survives all summarization:

## User's Last Message
<the user's actual input>

Key design decisions:

Uses extraSystemPrompt — no changes needed to the context engine, prompt builder, or compaction logic. The anchor piggybacks on the existing system prompt injection point that already survives compaction.
Skipped for silent replies & model-only runs — no noise for heartbeat polls or sub-50ms internal dispatches.
2000-char truncation — prevents any single anchor from consuming excessive context.
Appends to existing extraSystemPrompt — preserves group-chat or subagent context that may already be in the field.

Testing

Verified in production by inspecting prompt.submitted trace events — the anchor text [USER'S LAST MESSAGE: ...] appears in context.compiled and prompt.submitted events, confirming it's part of the system prompt that the model sees on every turn.

P.S. Malaysia's national flower is the hibiscus (bunga raya) — its five petals symbolise the five principles of Rukun Negara. 🇲🇾

Changed files

src/agents/pi-embedded-runner/run.ts (modified, +27/-0)
src/plugins/contracts/loader.contract.test.ts (modified, +1/-1)
src/plugins/contracts/registry.contract.test.ts (modified, +2/-2)

PR #75924: fix(loop-detection): block run-scoped consecutive cross-tool error cascades

Repository: openclaw/openclaw
Author: Shockang
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/75924

Description (problem / solution / changelog)

Summary

Problem: loop detection misses run-scoped cross-tool error cascades, so a run can keep pivoting across different failing tools without tripping the existing repeated-call guards.
Why it matters: this wastes tokens and runtime, and hides the fact that the run is already stuck on a shared underlying failure.
What changed: added a narrow consecutive_errors detector plus config/schema/docs/test coverage, all aligned to existing runId-scoped loop-detection semantics.
What did NOT change (scope boundary): no per-turn tool-call cap, no session-level counters, no wall-clock turn inference, no Control UI/browser-bundle work.

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes #75923
Related #72555
Related #75468
Related #75841
Related #53329
Related #42933
This PR fixes a bug or regression

Root Cause (if applicable)

Root cause: the loop detector tracked tool names, args, and repeated no-progress patterns, but it had no run-scoped detector for consecutive failing outcomes across different tools.
Missing detection / guardrail: no guard existed for the pattern “different tools, same run, every completed call fails.”
Contributing context (if known): a previous broader draft tried to solve this with a per-turn counter, but review showed that wall-clock turn inference was semantically wrong.

Regression Test Plan (if applicable)

Coverage level that should have caught this:
- Unit test
- Seam / integration test
- End-to-end test
- Existing coverage already sufficient
Target test or file:
- src/agents/tool-loop-detection.test.ts
- src/agents/pi-tools.before-tool-call.e2e.test.ts
Scenario the test should lock in:
- same-run consecutive cross-tool failures eventually block
- fresh run ids do not inherit the previous run's error streak
Why this is the smallest reliable guardrail: the pure detector test locks the counting semantics while the before-tool-call e2e test proves the real hook path and runId scoping.
Existing test that already covers this (if any): none before this PR.
If no new test is added, why not: N/A

User-visible / Behavior Changes

OpenClaw can now block obvious run-scoped cross-tool error cascades via tools.loopDetection.consecutiveErrorThreshold.
No behavior change unless loop detection is enabled.

Diagram (if applicable)

Before:
read(error) -> list(error) -> write(error) -> exec(error) -> keep trying tools

After:
read(error) -> list(error) -> write(error) -> threshold reached -> next tool call blocked

Security Impact (required)

New permissions/capabilities? (No)
Secrets/tokens handling changed? (No)
New/changed network calls? (No)
Command/tool execution surface changed? (No)
Data access scope changed? (No)
If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

OS: macOS
Runtime/container: local source checkout
Model/provider: N/A for targeted tests; local dev gateway for runtime smoke
Integration/channel (if any): agent tool loop detection + local gateway runtime
Relevant config (redacted): dev profile config under ~/.openclaw-dev/openclaw.json

Steps

Run pnpm test src/agents/tool-loop-detection.test.ts src/agents/pi-tools.before-tool-call.e2e.test.ts
Run pnpm build
Start local dev gateway from this checkout: pnpm openclaw --dev setup && pnpm openclaw --dev gateway run --port 19001 --verbose
Probe health: curl -sv --max-time 15 http://127.0.0.1:19001/healthz
Probe root HTTP surface: curl -I --max-time 10 http://127.0.0.1:19001/

Expected

targeted tests pass
build passes
gateway reaches ready
/healthz returns live status
root HTTP surface returns 200

Actual

targeted tests passed
build passed
gateway reached ready
/healthz returned {"ok":true,"status":"live"}
root HTTP surface returned HTTP/1.1 200 OK

Evidence

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)

Initial failing proof before implementation:

new unit tests failed because CONSECUTIVE_ERROR_THRESHOLD and the detector logic did not exist yet

Runtime smoke evidence after implementation:

local dev gateway log reached ready
/healthz response: {"ok":true,"status":"live"}
root HTTP response: HTTP/1.1 200 OK

Human Verification (required)

Verified scenarios:
- targeted detector tests
- hook-path run-scoped blocking/isolation tests
- full local build
- local dev gateway startup from the modified checkout
- local health and HTTP probes against the running instance
Edge cases checked:
- same-run block after cross-tool failures
- cross-run isolation via runId
What you did not verify:
- a real provider-driven long autonomous run naturally hitting the new detector
- globally installed managed gateway service restart path

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

Backward compatible? (Yes)
Config/env changes? (No)
Migration needed? (No)
If yes, exact upgrade steps: N/A

Risks and Mitigations

Risk: some legitimate workflows may intentionally try several failing tools in a row before succeeding.
- Mitigation: the detector is optional through loop detection enablement, thresholded, and naturally reset by any successful tool outcome.
Risk: overlap confusion with broader retry-budget / turn-cap proposals.
- Mitigation: this PR stays narrow and explicitly excludes per-turn or session-wide counters.

Changed files

CHANGELOG.md (modified, +1/-0)
docs/.generated/config-baseline.sha256 (modified, +3/-3)
docs/superpowers/plans/2026-05-02-loop-detection-consecutive-errors-only.md (added, +135/-0)
docs/superpowers/specs/2026-05-02-loop-detection-consecutive-errors-design.md (added, +104/-0)
docs/tools/loop-detection.md (modified, +3/-1)
src/agents/pi-tools.before-tool-call.e2e.test.ts (modified, +81/-0)
src/agents/tool-loop-detection.test.ts (modified, +98/-0)
src/agents/tool-loop-detection.ts (modified, +39/-1)
src/config/schema.base.generated.ts (modified, +10/-0)
src/config/types.tools.ts (modified, +2/-0)
src/config/zod-schema.agent-runtime.ts (modified, +1/-0)
src/infra/diagnostic-events.ts (modified, +2/-1)
src/logging/diagnostic.ts (modified, +2/-1)

Code Example

{
  "agents": {
    "defaults": {
      "maxTurnsPerRun": 30,
      "retryBudget": {
        "perErrorKey": 3,
        "perDispatch": 10
      }
    }
  }
}

---

{
     "type": "harness_terminate",
     "kind": "harness_terminate",
     "reason": "turn_cap_reached" | "retry_budget_exceeded" | "per_dispatch_errors_exceeded",
     "at_turn": <int>
   }

RAW_BUFFERClick to expand / collapse

feature: harness-level turn cap and retry budget (maxTurnsPerRun, retryBudget)

Why

Self-enforcement (the agent calls a helper script that says "stop") is not enforcement: a misbehaving agent will not voluntarily invoke its own brake. The cap has to live in the harness, before any agent code runs, with no way for the agent's prompt or tool calls to disable it.

Proposal

Two new config fields in openclaw.json, applied in the harness layer that owns the agent's tool-execution loop:

{
  "agents": {
    "defaults": {
      "maxTurnsPerRun": 30,
      "retryBudget": {
        "perErrorKey": 3,
        "perDispatch": 10
      }
    }
  }
}

Per-agent overrides allowed under agents.<agent_name> with the same shape.

Semantics

maxTurnsPerRun — total tool-calling rounds permitted in one dispatch. After this many rounds, the harness terminates the dispatch.
retryBudget.perErrorKey — same error key (a deterministic digest of error message + tool name) may be retried at most this many times within one dispatch.
retryBudget.perDispatch — total errors of any kind permitted in one dispatch.

When any cap fires, the harness:

Stops invoking the LLM and stops executing further tool calls.

Writes a final entry to the trajectory:

{
  "type": "harness_terminate",
  "kind": "harness_terminate",
  "reason": "turn_cap_reached" | "retry_budget_exceeded" | "per_dispatch_errors_exceeded",
  "at_turn": <int>
}

Returns control to the parent process with a non-zero exit and the reason on stderr.

The agent cannot read or modify these config values from inside its tool environment. They are not part of the prompt, not part of any file the agent's tools can write to.

Why config-driven (not hardcoded)

Different agents have legitimately different tolerances. A code-writing agent doing iterative refactors may need 50 turns. A simple lookup agent should be capped at 5. The config approach lets each project tune without forking OpenClaw.

Acceptance

A test agent that loops on a failing tool call terminates within maxTurnsPerRun rounds, regardless of the prompt or any helper scripts in its environment.
Removing or stubbing any agent-level helper does not affect enforcement.
Trajectory contains the harness_terminate entry.
Per-agent override under agents.<name> overrides the default.

Local context (downstream user)

We have shipped a local watchdog as a bridge — a separate systemd oneshot that scans active session trajectories every 60 seconds and SIGTERMs sessions that exceed our cap. It works for our case but it has obvious limits (60s granularity, can over-fire on legitimate long sessions, requires us to maintain it). We would prefer to retire it once OpenClaw provides this natively.

Tracked downstream as bavariacoin/learnbavarian#121 (private repo, summary in this issue is sufficient).

Happy to help with implementation, review, or tests if useful. Thanks for OpenClaw.

extent analysis

TL;DR

Implementing a harness-level turn cap and retry budget in OpenClaw through configuration fields maxTurnsPerRun and retryBudget can prevent excessive tool-calling loops.

Guidance

Introduce maxTurnsPerRun and retryBudget fields in openclaw.json to set limits on tool-calling rounds and error retries.
Apply these fields at the harness layer to prevent agent code from disabling them.
Test the implementation with an agent that loops on a failing tool call to ensure it terminates within the specified maxTurnsPerRun rounds.
Verify that the trajectory contains a harness_terminate entry with the appropriate reason.

Example

{
  "agents": {
    "defaults": {
      "maxTurnsPerRun": 30,
      "retryBudget": {
        "perErrorKey": 3,
        "perDispatch": 10
      }
    }
  }
}

Notes

The proposed solution relies on configuration-driven limits to accommodate different agent tolerances, allowing projects to tune these values without forking OpenClaw.

Recommendation

Apply the proposed workaround by implementing the maxTurnsPerRun and retryBudget fields in openclaw.json to provide a native solution and potentially retire the local watchdog.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#agent setup #task chaining #parallel task #integration issue #index setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix feature: harness-level turn cap and retry budget (maxTurnsPerRun, retryBudget) [2 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fix / Workaround

PR fix notes

PR #75908: fix: inject user's last message as system-level anchor to survive compaction

Description (problem / solution / changelog)

Problem

Solution

Testing

Changed files

PR #75924: fix(loop-detection): block run-scoped consecutive cross-tool error cascades

Description (problem / solution / changelog)

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Root Cause (if applicable)

Regression Test Plan (if applicable)

User-visible / Behavior Changes

Diagram (if applicable)

Security Impact (required)

Repro + Verification

Environment

Steps

Expected

Actual

Evidence

Human Verification (required)

Review Conversations

Compatibility / Migration

Risks and Mitigations

Changed files

Code Example

feature: harness-level turn cap and retry budget (maxTurnsPerRun, retryBudget)

Why

Proposal

Semantics

Why config-driven (not hardcoded)

Acceptance

Local context (downstream user)

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING