openclaw - ✅(Solved) Fix [Feature]: Detect and recover from output truncation (stopReason:"length") in main agent sessions [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#63210Fetched 2026-04-09 07:56:57
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Author
Participants

When the model hits its max_output_tokens limit mid-generation (especially during tool_use with large parameters), the agent runtime silently proceeds with truncated output. There is no detection, no recovery, and no signal to the LLM that its output was incomplete.

This is a complementary request to #63188, which covers mode:"run" subagent sessions. This issue covers the main interactive agent session, where truncated tool_use parameters lead to silent data corruption.

Error Message

But in pi-embedded (pi-embedded-bukGSgEe.js), grep -c length returns 0. The agent loop only handles "error", "aborted", "toolUse", and "stop". When stopReason is "length", it falls through to default behavior and is treated as a normal completion. LLM generates write(file, <8000 chars of content>). Output hits token limit mid-content. The write tool receives truncated content and writes an incomplete file. No error is raised. The LLM proceeds as if the write succeeded.

  1. Return the error to the LLM as a tool_result with is_error: true (this already exists)
  • "warn" — inject warning message to LLM, still execute tools

Root Cause

Root Cause Analysis (from source code)

PR fix notes

PR #66146: fix(agents): forward model maxTokens as default stream option

Description (problem / solution / changelog)

Summary

When options.maxTokens is not explicitly provided, the OpenAI-compatible transport functions (buildOpenAIResponsesParams and buildOpenAICompletionsParams) now fall back to model.maxTokens instead of omitting the max_tokens / max_completion_tokens / max_output_tokens field entirely.

Problem

Requests sent to OpenRouter (and other OpenAI-compatible providers) without an explicit max_tokens parameter hit provider-side defaults — often 8192 tokens for Anthropic models via OpenRouter. This caused:

  1. Silent output truncation on large generations (HTML documents, long markdown files, comprehensive tool-call arguments)
  2. Infinite retry loops when a write tool call exceeded 8192 output tokens — the content field was truncated away, OpenClaw rejected the tool call with must have required property 'content', and the model retried identically
  3. Sub-agent timeouts — agents gathered research successfully but timed out generating the deliverable because each attempt was truncated

Evidence from session 36425fd6: 9 consecutive write attempts, each hitting exactly output: 8192, stopReason: "length".

Fix

Both transport paths now resolve an effectiveMaxTokens from options?.maxTokens || model.maxTokens before building the API payload. The model object already carries the correct limit from the provider capability cache (e.g. 128K for Claude Opus 4.6 via OpenRouter's top_provider.max_completion_tokens).

This aligns the OpenAI-compatible transport with the existing Anthropic direct transport behavior at anthropic-transport-stream.ts:493, which already defaults to Math.min(model.maxTokens, 32_000).

Changes

  • src/agents/openai-transport-stream.tsbuildOpenAIResponsesParams: fall back to model.maxTokens for max_output_tokens
  • src/agents/openai-transport-stream.tsbuildOpenAICompletionsParams: fall back to model.maxTokens for max_tokens / max_completion_tokens (respecting compat.maxTokensField)

Test Plan

  • Existing 48 transport stream tests pass
  • Build passes
  • Gateway restart + manual verification on live OpenRouter traffic

Related Issues

  • Partially addresses #63210 (output truncation detection) — fixes the root cause of truncation for OpenRouter/OpenAI-compatible providers by sending the correct max_tokens
  • Related to #49173 and #62130 (max_tokens vs max_completion_tokens) — ensures the value is always sent using the correct field name
  • Reduces urgency of #44468 (per-model maxTokens config overrides) — the correct value now flows automatically from the provider capability cache

Changed files

  • src/agents/openai-transport-stream.ts (modified, +11/-7)
  • src/tui/tui-event-handlers.test.ts (modified, +187/-40)
  • src/tui/tui-event-handlers.ts (modified, +55/-30)

Code Example

case BedrockStopReason.MAX_TOKENS:
case BedrockStopReason.MODEL_CONTEXT_WINDOW_EXCEEDED:
    return "length";

---

stopReason: content.some((part) => part.type === "toolCall") ? "toolUse" : "stop"

---

{
  "agents": {
    "defaults": {
      "runtime": {
        "onOutputTruncation": "recover",
        "maxTruncationRecoveries": 3,
        "toolRetryBudget": 3
      }
    }
  }
}
RAW_BUFFERClick to expand / collapse

Summary

When the model hits its max_output_tokens limit mid-generation (especially during tool_use with large parameters), the agent runtime silently proceeds with truncated output. There is no detection, no recovery, and no signal to the LLM that its output was incomplete.

This is a complementary request to #63188, which covers mode:"run" subagent sessions. This issue covers the main interactive agent session, where truncated tool_use parameters lead to silent data corruption.

Root Cause Analysis (from source code)

1. Provider maps max_tokens"length", but the agent loop never checks for it.

In pi-ai/dist/providers/amazon-bedrock.js (line 555):

case BedrockStopReason.MAX_TOKENS:
case BedrockStopReason.MODEL_CONTEXT_WINDOW_EXCEEDED:
    return "length";

In the Anthropic provider, Dr() maps "max_tokens""length".

But in pi-embedded (pi-embedded-bukGSgEe.js), grep -c length returns 0. The agent loop only handles "error", "aborted", "toolUse", and "stop". When stopReason is "length", it falls through to default behavior and is treated as a normal completion.

2. One code path overwrites the provider stop_reason entirely.

buildAssistantMessageFromResponse() (line 30117):

stopReason: content.some((part) => part.type === "toolCall") ? "toolUse" : "stop"

This derives stopReason from content type, ignoring the provider's actual stop reason. If the model was truncated mid-tool-call but the partial JSON still parsed as a toolCall content block, the stopReason is set to "toolUse" — and the tool executes with truncated parameters.

3. Empirical evidence: zero max_tokens events in session history.

Across all agent sessions on our instance (main + subagents), stopReason:"max_tokens" has never appeared. This could mean:

  • The current maxTokens setting (model.maxTokens/3 ≈ 10K for Opus 4.6) is high enough that truncation is rare
  • OR truncation IS happening but is masked by the content-type-based stopReason override (finding #2)

Problem Scenarios

Scenario A: Tool parameter truncation (most dangerous) LLM generates write(file, <8000 chars of content>). Output hits token limit mid-content. The write tool receives truncated content and writes an incomplete file. No error is raised. The LLM proceeds as if the write succeeded.

This was the root cause of a 20-hour subagent death spiral on 2026-03-31: truncated write → incomplete file → compaction reads corrupt context → empty exec loop.

Scenario B: Text + tool_use truncation LLM generates explanation text + tool_use in one response. Token limit cuts off the tool_use JSON. If the partial JSON doesn't parse, the toolCall content block is dropped, stopReason becomes "stop" (no toolCalls detected), and the turn ends prematurely without executing the intended tool.

Proposed Solution

Phase 1: Detection (low risk, high value)

When the runtime receives stopReason: "length" from the provider (before content-type override):

  1. Preserve the original provider stop_reason as metadata (e.g., _providerStopReason) alongside the derived stopReason
  2. If the assistant message contains tool_use blocks AND the provider stop_reason was max_tokens/length:
    • Validate JSON completeness of each tool_use parameter
    • If any tool_use has malformed JSON → do not execute tools
    • Inject a system message: "⚠️ Your previous response was truncated (stop_reason=max_tokens, output_tokens=N). Your tool call parameters may be incomplete. Please break your output into smaller parts and retry."
  3. If no tool_use blocks (pure text was truncated):
    • Inject: "⚠️ Your previous response was truncated at the output token limit. Please continue from where you left off."

Phase 2: Tool execution failure feedback loop

When a tool execution fails (regardless of truncation):

  1. Return the error to the LLM as a tool_result with is_error: true (this already exists)
  2. Add a retry budget (configurable, default: 3) — track consecutive failures for the same tool
  3. After exhausting retries, inject a system message asking the LLM to reflect on the failure pattern and try a different approach
  4. Circuit breaker: if a tool fails > N times across the session, temporarily block it and inform the LLM

Configuration

{
  "agents": {
    "defaults": {
      "runtime": {
        "onOutputTruncation": "recover",
        "maxTruncationRecoveries": 3,
        "toolRetryBudget": 3
      }
    }
  }
}

Values for onOutputTruncation:

  • "ignore" — current behavior (default for backward compat)
  • "warn" — inject warning message to LLM, still execute tools
  • "recover" — validate tool params, block execution if malformed, inject recovery prompt

Relationship to #63188

#63188 covers mode:"run" subagent sessions with onLengthTruncation: "continue"/"fail"/"terminate".

This issue covers the main interactive session and focuses on:

  • Preserving provider stop_reason through the content-type override
  • Tool parameter JSON validation before execution
  • LLM-facing recovery prompts (the LLM can self-correct, unlike a terminated subagent)
  • Tool execution failure feedback loops (applicable beyond truncation)

These are complementary — #63188's "fail" mode gives the orchestrator a signal; this issue gives the LLM itself a signal to self-correct.

Environment

  • OpenClaw: 2026.4.1
  • OS: Ubuntu 24.04 / Linux 6.17.0
  • Model: Claude Opus 4.6 via Amazon Bedrock
  • Source files examined:
    • dist/pi-embedded-bukGSgEe.js (lines 24808, 24972, 30066-30117, 37397-37400)
    • node_modules/@mariozechner/pi-ai/dist/providers/amazon-bedrock.js (lines 548-563)
    • Anthropic SDK stream handler Dr() function

extent analysis

TL;DR

Preserve the original provider stop reason and validate tool use parameters to prevent silent data corruption when the model hits its max output tokens limit.

Guidance

  • Check the provider stop reason and preserve it as metadata to detect truncation.
  • Validate JSON completeness of each tool use parameter when the provider stop reason is "max_tokens" or "length".
  • Inject a system message to the LLM if truncation is detected, warning about potential incomplete tool call parameters.
  • Consider implementing a retry budget and circuit breaker for tool execution failures to prevent repeated errors.

Example

{
  "agents": {
    "defaults": {
      "runtime": {
        "onOutputTruncation": "recover",
        "maxTruncationRecoveries": 3,
        "toolRetryBudget": 3
      }
    }
  }
}

This configuration example shows how to enable recovery from output truncation and set up a retry budget for tool execution failures.

Notes

The proposed solution focuses on detection and recovery from output truncation, and it is complementary to the solution in #63188, which covers mode:"run" subagent sessions.

Recommendation

Apply the workaround by preserving the original provider stop reason and validating tool use parameters to prevent silent data corruption. This approach allows the LLM to self-correct and provides a signal to the LLM to reflect on the failure pattern and try a different approach.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING