openclaw - 💡(How to fix) Fix [Feature]: Agent Runtime: Configurable stopReason="length" catch-and-continue for mode:"run" sessions [1 participants]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#63188Fetched 2026-04-09 07:57:16
View on GitHub
Comments
0
Participants
1
Timeline
4
Reactions
0
Participants
Timeline (top)
cross-referenced ×1labeled ×1mentioned ×1subscribed ×1

When a subagent running in mode: "run" hits the provider's maximum output token limit mid-generation, OpenClaw receives stopReason: "length" from the API but the agent session is immediately terminated. The agent never sees the error, cannot recover, and leaves an incomplete output on disk with no signal that it is partial. There is currently no mechanism for the orchestrator to detect this condition and continue the agent's work.

This is a request for a configurable catch-and-continue behaviour at the orchestrator layer for stopReason: "length" events.

Error Message

When a subagent running in mode: "run" hits the provider's maximum output token limit mid-generation, OpenClaw receives stopReason: "length" from the API but the agent session is immediately terminated. The agent never sees the error, cannot recover, and leaves an incomplete output on disk with no signal that it is partial. There is currently no mechanism for the orchestrator to detect this condition and continue the agent's work. When a subagent running in mode:"run" hits the provider's maximum output token limit mid-generation, OpenClaw receives stopReason:"length" from the API but immediately terminates the session. The agent (LLM) never sees this error — the stopReason metadata exists only in OpenClaw's runtime layer, not in the model's token stream. The agent cannot detect or recover from its own truncation. error to the caller. "fail" — terminate AND surface a hard error to the caller The "fail" mode (surface a hard error instead of silent termination) is a low-effort, but surface a hard error to the caller. No continuation logic required.

Root Cause

Because the truncated output contains no valid trailing tool call, OpenClaw interprets the turn as complete rather than failed. The session state is wiped, and the output file on disk is left incomplete with no flag or marker indicating truncation.

Fix Action

Fix / Workaround

  1. Prompt-level chunking protocol (implemented as current workaround) Instruct agents to write output in sequential tool calls (e.g. "Write sections 1-3, then call the tool again to write sections 4-6") so that no single generation pass approaches the token ceiling.

  2. Rely on the orchestrating agent to detect incomplete output via wc -w or file size checks and re-dispatch manually Viable as a last-resort recovery, but requires the orchestrating agent to implement this logic explicitly, adds orchestration overhead, and does not prevent the data loss (the incomplete session state is already wiped by the time the check runs).

In all three cases, the orchestrating agent (a separate persistent agent responsible for dispatching and verifying the subagents' output) received no failure signal. Only a downstream wc -w check on the output files revealed the truncation.

RAW_BUFFERClick to expand / collapse

Summary

When a subagent running in mode: "run" hits the provider's maximum output token limit mid-generation, OpenClaw receives stopReason: "length" from the API but the agent session is immediately terminated. The agent never sees the error, cannot recover, and leaves an incomplete output on disk with no signal that it is partial. There is currently no mechanism for the orchestrator to detect this condition and continue the agent's work.

This is a request for a configurable catch-and-continue behaviour at the orchestrator layer for stopReason: "length" events.

Problem to solve

When a subagent running in mode:"run" hits the provider's maximum output token limit mid-generation, OpenClaw receives stopReason:"length" from the API but immediately terminates the session. The agent (LLM) never sees this error — the stopReason metadata exists only in OpenClaw's runtime layer, not in the model's token stream. The agent cannot detect or recover from its own truncation.

Because the truncated output contains no valid trailing tool call, OpenClaw interprets the turn as complete rather than failed. The session state is wiped, and the output file on disk is left incomplete with no flag or marker indicating truncation.

Any orchestrating agent that spawned this subagent receives no actionable failure signal — it may receive a DONE confirmation or observe a file on disk, but has no way to know the file is partial. The failure is silent.

This affects all mode:"run" sessions where the model's generation (including any internal reasoning tokens) approaches or exceeds the provider's output token ceiling.

Proposed solution

Add a configurable onLengthTruncation behaviour to the agent runtime, settable per-agent in openclaw.json:

{ "agents": { "myAgent": { "model": { "primary": "google/gemini-3-flash-preview" }, "runtime": { "onLengthTruncation": "continue", "maxContinuations": 3 } } } }

Behaviour when onLengthTruncation: "continue": When the runtime detects stopReason:"length" on a mode:"run" session:

  1. Do not terminate the session.
  2. Inject a continuation prompt into the next turn: "Your previous response was cut off due to output length limits. Resume generation exactly where you left off. Do not repeat any content already written. Continue from the last complete sentence."
  3. Allow the agent to continue until it produces a valid tool call or a natural stop (stopReason:"stop" / "end_turn").
  4. Enforce maxContinuations (suggested default: 3) to prevent infinite loops. If the cap is reached without a clean stop, mark the session as failed and surface a hard error to the caller.

Proposed config values for onLengthTruncation:

"terminate" — current behaviour (default, no breaking change) "continue" — inject continuation prompt and resume generation "fail" — terminate AND surface a hard error to the caller

Note: the "fail" mode alone, without "continue", would already be a meaningful improvement over the current silent termination — it gives orchestrating agents actionable signal to trigger a retry or escalate.

Alternatives considered

  1. Prompt-level chunking protocol (implemented as current workaround) Instruct agents to write output in sequential tool calls (e.g. "Write sections 1-3, then call the tool again to write sections 4-6") so that no single generation pass approaches the token ceiling.

    Why it is insufficient on its own:

    • Relies on model compliance. Smaller/faster models under concurrent load may revert to single-pass generation despite explicit instructions.
    • Does not address backend errors (e.g. HTTP 500 from the provider) which also cause silent session termination with no stopReason:"length" trigger.
    • Fragile across model swaps — token ceilings vary by provider and model. A prompt tuned for an 8,192-token ceiling breaks silently if the model is replaced with one that has a 4,096 limit.
    • The chunking protocol and a runtime catch-and-continue are complementary, not mutually exclusive.
  2. Modify OpenClaw core to auto-retry the entire session from scratch on truncation Rejected — retrying from scratch wastes all tokens already generated, is expensive, and risks loops if the retry also truncates.

  3. Rely on the orchestrating agent to detect incomplete output via wc -w or file size checks and re-dispatch manually Viable as a last-resort recovery, but requires the orchestrating agent to implement this logic explicitly, adds orchestration overhead, and does not prevent the data loss (the incomplete session state is already wiped by the time the check runs).

Impact

Affected users: Any operator running subagents in mode:"run" for long-form content generation tasks (document writing, research synthesis, report generation, etc.).

Severity: Medium-High. The failure is silent — operators may not notice incomplete output unless they implement explicit downstream word count or structural validation. In automated pipelines with no human review step, incomplete documents reach their final destination undetected.

Scope: Affects all providers that return a stopReason / finish_reason:"length" field. Confirmed on google/gemini-3-flash-preview. Expected to affect any provider whose models are used for output-heavy tasks (long documents, structured data generation).

The "fail" mode (surface a hard error instead of silent termination) is a low-effort, low-risk subset of this request that would address the silent failure problem immediately, even before "continue" is implemented.

Evidence/examples

Observed in production: 8 concurrent mode:"run" subagents (all using google/gemini-3-flash-preview) tasked with generating ~3,000-word research documents. Three failed silently in one batch:

SubAgent A: stopReason:"length", output: 0 tokens, input: ~12,300 tokens, cacheRead: ~57,000 tokens. The model's internal reasoning block consumed the entire 8,192-token output budget before any visible content was written. Result: empty output file on disk.

SubAgent B: stopReason:"length". Document truncated at approximately 1,500 words mid-section. Session terminated. File left on disk at 33% of the required length.

SubAgent C: stopReason:"length". Document truncated at approximately 1,900 words. Session terminated. File left on disk at ~63% of required length.

In all three cases, the orchestrating agent (a separate persistent agent responsible for dispatching and verifying the subagents' output) received no failure signal. Only a downstream wc -w check on the output files revealed the truncation.

Environment: OpenClaw: latest (self-hosted, npm install, April 2026) OS: Ubuntu 24.04 Node: v22.22.1 Provider: google/gemini-3-flash-preview via OpenRouter

Additional information

Relevant maintainer: @tyler6204 (Agents/subagents per CONTRIBUTING.md)

This request aligns with the current stated priority area of "Performance: optimizing token usage and compaction logic" — the catch-and-continue mechanism directly reduces wasted tokens by recovering truncated sessions rather than discarding them.

Suggested implementation path: Phase 1 (low risk): Implement onLengthTruncation:"fail" — terminate the session but surface a hard error to the caller. No continuation logic required. Eliminates silent failures immediately. Phase 2: Implement onLengthTruncation:"continue" with maxContinuations cap.

The default (onLengthTruncation:"terminate") preserves existing behaviour exactly, so this is a fully opt-in, non-breaking addition.

extent analysis

TL;DR

Implement a configurable onLengthTruncation behavior in the agent runtime to handle stopReason: "length" events, allowing for either termination with a hard error or continuation with a prompt to resume generation.

Guidance

  1. Configure onLengthTruncation behavior: Set onLengthTruncation to either "terminate", "continue", or "fail" in the openclaw.json file to control how the agent handles stopReason: "length" events.
  2. Implement continuation logic: If onLengthTruncation is set to "continue", inject a continuation prompt into the next turn to allow the agent to resume generation from where it left off.
  3. Enforce maxContinuations cap: To prevent infinite loops, set a maxContinuations limit to determine how many times the agent can continue generation before marking the session as failed.
  4. Verify implementation: Test the onLengthTruncation behavior with different configurations to ensure it correctly handles stopReason: "length" events and prevents silent failures.

Example

{
  "agents": {
    "myAgent": {
      "model": { "primary": "google/gemini-3-flash-preview" },
      "runtime": {
        "onLengthTruncation": "continue",
        "maxContinuations": 3
      }
    }
  }
}

Notes

The proposed solution is a non-breaking addition, and the default onLengthTruncation behavior preserves existing functionality. Implementing onLengthTruncation: "fail" as a first phase can immediately eliminate silent failures.

Recommendation

Apply the workaround by implementing the onLengthTruncation behavior, starting with the "fail" mode to address silent failures, and then adding the "continue" mode to allow for session recovery. This approach provides a clear and actionable error signal to the caller and reduces wasted tokens by recovering truncated sessions.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING