openclaw - ✅(Solved) Fix Compaction fails with 1M context: max_tokens 240000 > 128000 for Anthropic models [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#54383Fetched 2026-04-08 01:28:19
View on GitHub
Comments
0
Participants
1
Timeline
10
Reactions
1
Author
Participants
Timeline (top)
referenced ×9cross-referenced ×1

Compaction summarization fails with max_tokens: 240000 > 128000 error when using Anthropic Claude models (Sonnet 4.6 / Opus 4.6) with 1M context windows enabled via context1m: true.

Error Message

Compaction summarization fails with max_tokens: 240000 > 128000 error when using Anthropic Claude models (Sonnet 4.6 / Opus 4.6) with 1M context windows enabled via context1m: true. Summarization failed: 400 {"type":"error","error":{"type":"invalid_request_error",

  • The 240K value does not change when adjusting keepRecentTokens (tested 500K → 200K, same 240K error)

Root Cause

The compaction summarizer calculates an output token budget (max_tokens) that exceeds the Anthropic API per-request output cap (128K for both Sonnet 4.6 and Opus 4.6).

Key observations:

  • The built-in model catalog in provider-catalog-*.js correctly registers maxTokens: 128e3 for Anthropic Vertex models
  • The resolveNormalizedProviderModelMaxTokens() function in io-*.js does Math.min(rawMaxTokens, contextWindow) which should cap correctly
  • However, the compaction code in pi-embedded-*.js appears to calculate its own output budget independently, requesting 240K tokens which exceeds the model cap
  • The 240K value does not change when adjusting keepRecentTokens (tested 500K → 200K, same 240K error)
  • Both claude-sonnet-4-6 and claude-opus-4-6 have the same 128K per-request output limit, so switching compaction.model between them does not help

Fix Action

Workaround

None fully effective. /reset starts a fresh session. Adjusting keepRecentTokens does not change the 240K output request.

PR fix notes

PR #54392: fix: clamp compaction max_tokens to model output limit

Description (problem / solution / changelog)

Summary

Fixes #54383 — Compaction fails with max_tokens: 240000 > 128000 when using Anthropic models with 1M context windows.

Root Cause

In @mariozechner/pi-coding-agent, generateSummary() calculates:

const maxTokens = Math.floor(0.8 * reserveTokens);

With reserveTokensFloor: 300000 (appropriate for 1M context), this produces max_tokens = 240000 — exceeding Anthropic's per-request output cap of 128K for both Sonnet 4.6 and Opus 4.6.

Fix

Clamp reserveTokens in src/agents/compaction.ts before passing to generateSummary():

const modelMaxTokens = params.model.maxTokens ?? 128_000;
const clampedReserveTokens = Math.min(params.reserveTokens, Math.floor(modelMaxTokens / 0.8));

This ensures the downstream max_tokens calculation (0.8 * reserveTokens) never exceeds the model's actual output limit. The fix uses model.maxTokens from the provider registry, so it's forward-compatible — if future models raise their output cap, no code change is needed.

Impact

  • Before: Compaction broken for all users with Anthropic + 1M context (any reserveTokensFloor > 160K)
  • After: Compaction works correctly, respecting model output limits while preserving the existing summarization quality

Testing

The fix is in the OpenClaw wrapper layer (src/agents/compaction.ts), not in the upstream pi-coding-agent package. This is the minimal, safest change — the upstream package could also benefit from the same clamp in generateSummary() itself.

Verified that:

  • model.maxTokens is populated from the provider catalog (128K for Anthropic Vertex models)
  • Math.floor(128000 / 0.8) = 160000, so clampedReserveTokens = min(300000, 160000) = 160000
  • generateSummary then calculates Math.floor(0.8 * 160000) = 128000 ✅ (within model limit)

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • src/agents/compaction.reserve-tokens-clamping.test.ts (added, +143/-0)
  • src/agents/compaction.ts (modified, +10/-1)

Code Example

{
     "agents": {
       "defaults": {
         "contextTokens": 1000000,
         "compaction": {
           "model": "anthropic/claude-sonnet-4-6",
           "keepRecentTokens": 500000,
           "reserveTokensFloor": 300000,
           "maxHistoryShare": 0.75,
           "recentTurnsPreserve": 12
         }
       }
     }
   }

---

Summarization failed: 400 {"type":"error","error":{"type":"invalid_request_error",
   "message":"max_tokens: 240000 > 128000, which is the maximum allowed number of 
   output tokens for claude-sonnet-4-6"}}

---

// Before sending the summarization request, cap to model's output limit
const effectiveMaxTokens = Math.min(
  calculatedOutputBudget,
  modelEntry.maxTokens ?? 128_000  // fallback to safe default
);
RAW_BUFFERClick to expand / collapse

Bug Report

Summary

Compaction summarization fails with max_tokens: 240000 > 128000 error when using Anthropic Claude models (Sonnet 4.6 / Opus 4.6) with 1M context windows enabled via context1m: true.

Environment

  • OpenClaw: 2026.3.23-2 (7ffe7e4)
  • OS: macOS 15.3 (arm64)
  • Model: anthropic/claude-sonnet-4-6 (also affects claude-opus-4-6)
  • Context config: contextTokens: 1000000 with context1m: true API header

Steps to Reproduce

  1. Configure an agent with 1M context:
    {
      "agents": {
        "defaults": {
          "contextTokens": 1000000,
          "compaction": {
            "model": "anthropic/claude-sonnet-4-6",
            "keepRecentTokens": 500000,
            "reserveTokensFloor": 300000,
            "maxHistoryShare": 0.75,
            "recentTurnsPreserve": 12
          }
        }
      }
    }
  2. Use the agent until context reaches ~200K+ tokens
  3. Trigger compaction via /compact
  4. Compaction fails with:
    Summarization failed: 400 {"type":"error","error":{"type":"invalid_request_error",
    "message":"max_tokens: 240000 > 128000, which is the maximum allowed number of 
    output tokens for claude-sonnet-4-6"}}

Root Cause Analysis

The compaction summarizer calculates an output token budget (max_tokens) that exceeds the Anthropic API per-request output cap (128K for both Sonnet 4.6 and Opus 4.6).

Key observations:

  • The built-in model catalog in provider-catalog-*.js correctly registers maxTokens: 128e3 for Anthropic Vertex models
  • The resolveNormalizedProviderModelMaxTokens() function in io-*.js does Math.min(rawMaxTokens, contextWindow) which should cap correctly
  • However, the compaction code in pi-embedded-*.js appears to calculate its own output budget independently, requesting 240K tokens which exceeds the model cap
  • The 240K value does not change when adjusting keepRecentTokens (tested 500K → 200K, same 240K error)
  • Both claude-sonnet-4-6 and claude-opus-4-6 have the same 128K per-request output limit, so switching compaction.model between them does not help

Expected Behavior

The compaction summarizer should cap max_tokens to the model's actual output limit (128K for Anthropic). If the summary would exceed this, it should either:

  1. Clamp max_tokens to the model's output ceiling, or
  2. Chunk the summarization into multiple passes that each fit within the output limit, or
  3. Use the model registry's maxTokens value when building the summarization API request

Proposed Fix

In the compaction summarization path (pi-embedded-*.js), add a clamp:

// Before sending the summarization request, cap to model's output limit
const effectiveMaxTokens = Math.min(
  calculatedOutputBudget,
  modelEntry.maxTokens ?? 128_000  // fallback to safe default
);

This is a one-line fix that prevents the API rejection while preserving the existing summarization logic for models with higher output limits.

Impact

This blocks compaction for all users running Anthropic Claude models with 1M context windows — a configuration that was introduced in OpenClaw 2026.3.22. Workaround is to use /reset instead of /compact, but this loses session continuity.

Workaround

None fully effective. /reset starts a fresh session. Adjusting keepRecentTokens does not change the 240K output request.

Labels

bug, compaction, anthropic, context-window

extent analysis

Fix Plan

To resolve the compaction summarization issue, follow these steps:

  • Update the pi-embedded-*.js file to include a clamp for the max_tokens value.
  • Use the modelEntry.maxTokens value to cap the calculatedOutputBudget.

Example code:

// Before sending the summarization request, cap to model's output limit
const effectiveMaxTokens = Math.min(
  calculatedOutputBudget,
  modelEntry.maxTokens ?? 128_000  // fallback to safe default
);

// Use the effectiveMaxTokens value in the API request
const summarizationRequest = {
  // ... other request properties ...
  max_tokens: effectiveMaxTokens,
};

Alternatively, consider implementing a chunking approach to handle summaries that exceed the model's output limit:

// Chunk the summarization into multiple passes
const chunkSize = modelEntry.maxTokens ?? 128_000;
const chunks = [];
for (let i = 0; i < calculatedOutputBudget; i += chunkSize) {
  const chunk = {
    // ... chunk properties ...
    max_tokens: Math.min(chunkSize, calculatedOutputBudget - i),
  };
  chunks.push(chunk);
}

// Send each chunk as a separate API request
chunks.forEach((chunk) => {
  // Send the request
});

Verification

To verify the fix, test the compaction summarization with the updated code and ensure that:

  • The max_tokens value is capped at the model's output limit (128K for Anthropic).
  • The summarization request is successful and returns a valid response.
  • The chunking approach (if implemented) handles summaries that exceed the model's output limit correctly.

Extra Tips

  • Consider adding logging or monitoring to track the number of times the max_tokens value is capped or chunking is used.
  • Review the model registry to ensure that the maxTokens value is correctly set for each model.
  • Test the fix with different models and context window sizes to ensure that the issue is fully resolved.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix Compaction fails with 1M context: max_tokens 240000 > 128000 for Anthropic models [1 pull requests, 1 participants]