openclaw - ✅(Solved) Fix Compaction fails with 1M context: max_tokens 240000 > 128000 for Anthropic models [1 pull requests, 1 participants]

adzendo · 2026-03-25T09:15:58Z

[openclaw] Compaction summarization fails with max tokens: 240000 128000 error when using Anthropic Claude models Sonnet 4.6 / Opus 4.6 with 1M context windows… Compaction summarization fails with `max_tokens: 240000 > 128000` error when using Anthropic Claude models (Sonnet 4.6 / Opus 4.6) with 1M context windows enabled via `context1m: true`. # PR #54392: fix: clamp compaction max_tokens to model output limit - Repository: openclaw/openclaw - Author: adzendo - State: open | merged: False - Link: https://github.com/openclaw/openclaw/pull/54392 ## Description (problem / solution / changelog) ## Summary Fixes #54383 — Compaction fails with `max_tokens: 240000 > 128000` when using Anthropic models with 1M context windows. ## Root Cause In `@mariozechner/pi-coding-agent`, `generateSummary()` calculates: ```typescript const maxTokens = Math.floor(0.8 * reserveTokens); ``` With `reserveTokensFloor: 300000` (appropriate for 1M context), this produces `max_tokens = 240000` — exceeding Anthropic's per-request output cap of 128K for both Sonnet 4.6 and Opus 4.6. ## Fix Clamp `reserveTokens` in `src/agents/compaction.ts` before passing to `generateSummary()`: ```typescript const modelMaxTokens = params.model.maxTokens ?? 128_000; const clampedReserveTokens = Math.min(params.reserveTokens, Math.floor(modelMaxTokens / 0.8)); ``` This ensures the downstream `max_tokens` calculation (`0.8 * reserveTokens`) never exceeds the model's actual output limit. The fix uses `model.maxTokens` from the provider registry, so it's forward-compatible — if future models raise their output cap, no code change is needed. ## Impact - **Before:** Compaction broken for all users with Anthropic + 1M context (any `reserveTokensFloor` > 160K) - **After:** Compaction works correctly, respecting model output limits while preserving the existing summarization quality ## Testing The fix is in the OpenClaw wrapper layer (`src/agents/compaction.ts`), not in the upstream `pi-coding-agent` package. This is the minimal, safest change — the upstream package could also benefit from the same clamp in `generateSummary()` itself. Verified that: - `model.maxTokens` is populated from the provider catalog (128K for Anthropic Vertex models) - `Math.floor(128000 / 0.8) = 160000`, so `clampedReserveTokens = min(300000, 160000) = 160000` - `generateSummary` then calculates `Math.floor(0.8 * 160000) = 128000` ✅ (within model limit) ## Changed files - `CHANGELOG.md` (modified, +1/-0) - `src/agents/compaction.reserve-tokens-clamping.test.ts` (added, +143/-0) - `src/agents/compaction.ts` (modified, +10/-1) ## Workaround None fully effective. `/reset` starts a fresh session. Adjusting `keepRecentTokens` does not change the 240K output request. ## Bug Report ### Summary Compaction summarization fails with `max_tokens: 240000 > 128000` error when using Anthropic Claude models (Sonnet 4.6 / Opus 4.6) with 1M context windows enabled via `context1m: true`. ### Environment - **OpenClaw:** 2026.3.23-2 (7ffe7e4) - **OS:** macOS 15.3 (arm64) - **Model:** `anthropic/claude-sonnet-4-6` (also affects `claude-opus-4-6`) - **Context config:** `contextTokens: 1000000` with `context1m: true` API header ### Steps to Reproduce 1. Configure an agent with 1M context: ```json { "agents": { "defaults": { "contextTokens": 1000000, "compaction": { "model": "anthropic/claude-sonnet-4-6", "keepRecentTokens": 500000, "reserveTokensFloor": 300000, "maxHistoryShare": 0.75, "recentTurnsPreserve": 12 } } } } ``` 2. Use the agent until context reaches ~200K+ tokens 3. Trigger compaction via `/compact` 4. Compaction fails with: ``` Summarization failed: 400 {"type":"error","error":{"type":"invalid_request_error", "message":"max_tokens: 240000 > 128000, which is the maximum allowed number of output tokens for claude-sonnet-4-6"}} ``` ### Root Cause Analysis The compaction summarizer calculates an output token budget (`max_tokens`) that exceeds the Anthropic API per-request output cap (128K for both Sonnet 4.6 and Opus 4.6). Key observations: - The built-in model catalog in `provider-catalog-*.js` correctly registers `maxTokens: 128e3` for Anthropic Vertex models - The `resolveNormalizedProviderModelMaxTokens()` function in `io-*.js` does `Math.min(rawMaxTokens, contextWindow)` which should cap correctly - However, the compaction code in `pi-embedded-*.js` appears to calculate its own output budget independently, requesting 240K tokens which exceeds the model cap - The 240K value does **not** change when adjusting `keepRecentTokens` (tested 500K → 200K, same 240K error) - Both `claude-sonnet-4-6` and `claude-opus-4-6` have the same 128K per-request output limit, so switching `compaction.model` between them does not help ### Expected Behavior The compaction summarizer should cap `max_tokens` to the model's actual output limit (128K for Anthropic). If the summary would exceed this, it should either: 1. **Clamp** `max_tokens` to the model's output ceiling, or 2. **Chunk** the summarization into multiple pa

openclaw2026-03-25 09:15:58

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#54383•Fetched 2026-04-08 01:28:19

View on GitHub

Comments

Participants

Timeline

Reactions

Author

adzendo

Participants

adzendo

Timeline (top)

referenced ×9cross-referenced ×1

Compaction summarization fails with max_tokens: 240000 > 128000 error when using Anthropic Claude models (Sonnet 4.6 / Opus 4.6) with 1M context windows enabled via context1m: true.

Error Message

Compaction summarization fails with max_tokens: 240000 > 128000 error when using Anthropic Claude models (Sonnet 4.6 / Opus 4.6) with 1M context windows enabled via context1m: true. Summarization failed: 400 {"type":"error","error":{"type":"invalid_request_error",

The 240K value does not change when adjusting keepRecentTokens (tested 500K → 200K, same 240K error)

Root Cause

The compaction summarizer calculates an output token budget (max_tokens) that exceeds the Anthropic API per-request output cap (128K for both Sonnet 4.6 and Opus 4.6).

Key observations:

The built-in model catalog in provider-catalog-*.js correctly registers maxTokens: 128e3 for Anthropic Vertex models
The resolveNormalizedProviderModelMaxTokens() function in io-*.js does Math.min(rawMaxTokens, contextWindow) which should cap correctly
However, the compaction code in pi-embedded-*.js appears to calculate its own output budget independently, requesting 240K tokens which exceeds the model cap
The 240K value does not change when adjusting keepRecentTokens (tested 500K → 200K, same 240K error)
Both claude-sonnet-4-6 and claude-opus-4-6 have the same 128K per-request output limit, so switching compaction.model between them does not help

Fix Action

Workaround

None fully effective. /reset starts a fresh session. Adjusting keepRecentTokens does not change the 240K output request.

PR fix notes

PR #54392: fix: clamp compaction max_tokens to model output limit

Repository: openclaw/openclaw
Author: adzendo
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/54392

Description (problem / solution / changelog)

Summary

Fixes #54383 — Compaction fails with max_tokens: 240000 > 128000 when using Anthropic models with 1M context windows.

Root Cause

In @mariozechner/pi-coding-agent, generateSummary() calculates:

const maxTokens = Math.floor(0.8 * reserveTokens);

With reserveTokensFloor: 300000 (appropriate for 1M context), this produces max_tokens = 240000 — exceeding Anthropic's per-request output cap of 128K for both Sonnet 4.6 and Opus 4.6.

Fix

Clamp reserveTokens in src/agents/compaction.ts before passing to generateSummary():

const modelMaxTokens = params.model.maxTokens ?? 128_000;
const clampedReserveTokens = Math.min(params.reserveTokens, Math.floor(modelMaxTokens / 0.8));

This ensures the downstream max_tokens calculation (0.8 * reserveTokens) never exceeds the model's actual output limit. The fix uses model.maxTokens from the provider registry, so it's forward-compatible — if future models raise their output cap, no code change is needed.

Impact

Before: Compaction broken for all users with Anthropic + 1M context (any reserveTokensFloor > 160K)
After: Compaction works correctly, respecting model output limits while preserving the existing summarization quality

Testing

The fix is in the OpenClaw wrapper layer (src/agents/compaction.ts), not in the upstream pi-coding-agent package. This is the minimal, safest change — the upstream package could also benefit from the same clamp in generateSummary() itself.

Verified that:

model.maxTokens is populated from the provider catalog (128K for Anthropic Vertex models)
Math.floor(128000 / 0.8) = 160000, so clampedReserveTokens = min(300000, 160000) = 160000
generateSummary then calculates Math.floor(0.8 * 160000) = 128000 ✅ (within model limit)

Changed files

CHANGELOG.md (modified, +1/-0)
src/agents/compaction.reserve-tokens-clamping.test.ts (added, +143/-0)
src/agents/compaction.ts (modified, +10/-1)

Code Example

{
     "agents": {
       "defaults": {
         "contextTokens": 1000000,
         "compaction": {
           "model": "anthropic/claude-sonnet-4-6",
           "keepRecentTokens": 500000,
           "reserveTokensFloor": 300000,
           "maxHistoryShare": 0.75,
           "recentTurnsPreserve": 12
         }
       }
     }
   }

---

Summarization failed: 400 {"type":"error","error":{"type":"invalid_request_error",
   "message":"max_tokens: 240000 > 128000, which is the maximum allowed number of 
   output tokens for claude-sonnet-4-6"}}

---

// Before sending the summarization request, cap to model's output limit
const effectiveMaxTokens = Math.min(
  calculatedOutputBudget,
  modelEntry.maxTokens ?? 128_000  // fallback to safe default
);

RAW_BUFFERClick to expand / collapse

Bug Report

Summary

Compaction summarization fails with max_tokens: 240000 > 128000 error when using Anthropic Claude models (Sonnet 4.6 / Opus 4.6) with 1M context windows enabled via context1m: true.

Environment

OpenClaw: 2026.3.23-2 (7ffe7e4)
OS: macOS 15.3 (arm64)
Model: anthropic/claude-sonnet-4-6 (also affects claude-opus-4-6)
Context config: contextTokens: 1000000 with context1m: true API header

Steps to Reproduce

Configure an agent with 1M context:

{
  "agents": {
    "defaults": {
      "contextTokens": 1000000,
      "compaction": {
        "model": "anthropic/claude-sonnet-4-6",
        "keepRecentTokens": 500000,
        "reserveTokensFloor": 300000,
        "maxHistoryShare": 0.75,
        "recentTurnsPreserve": 12
      }
    }
  }
}

Use the agent until context reaches ~200K+ tokens
Trigger compaction via /compact

Compaction fails with:

Summarization failed: 400 {"type":"error","error":{"type":"invalid_request_error",
"message":"max_tokens: 240000 > 128000, which is the maximum allowed number of 
output tokens for claude-sonnet-4-6"}}

Root Cause Analysis

The compaction summarizer calculates an output token budget (max_tokens) that exceeds the Anthropic API per-request output cap (128K for both Sonnet 4.6 and Opus 4.6).

Key observations:

The built-in model catalog in provider-catalog-*.js correctly registers maxTokens: 128e3 for Anthropic Vertex models
The resolveNormalizedProviderModelMaxTokens() function in io-*.js does Math.min(rawMaxTokens, contextWindow) which should cap correctly
However, the compaction code in pi-embedded-*.js appears to calculate its own output budget independently, requesting 240K tokens which exceeds the model cap
The 240K value does not change when adjusting keepRecentTokens (tested 500K → 200K, same 240K error)
Both claude-sonnet-4-6 and claude-opus-4-6 have the same 128K per-request output limit, so switching compaction.model between them does not help

Expected Behavior

The compaction summarizer should cap max_tokens to the model's actual output limit (128K for Anthropic). If the summary would exceed this, it should either:

Clamp max_tokens to the model's output ceiling, or
Chunk the summarization into multiple passes that each fit within the output limit, or
Use the model registry's maxTokens value when building the summarization API request

Proposed Fix

In the compaction summarization path (pi-embedded-*.js), add a clamp:

// Before sending the summarization request, cap to model's output limit
const effectiveMaxTokens = Math.min(
  calculatedOutputBudget,
  modelEntry.maxTokens ?? 128_000  // fallback to safe default
);

This is a one-line fix that prevents the API rejection while preserving the existing summarization logic for models with higher output limits.

Impact

This blocks compaction for all users running Anthropic Claude models with 1M context windows — a configuration that was introduced in OpenClaw 2026.3.22. Workaround is to use /reset instead of /compact, but this loses session continuity.

Workaround

None fully effective. /reset starts a fresh session. Adjusting keepRecentTokens does not change the 240K output request.

Labels

bug, compaction, anthropic, context-window

extent analysis

Fix Plan

To resolve the compaction summarization issue, follow these steps:

Update the pi-embedded-*.js file to include a clamp for the max_tokens value.
Use the modelEntry.maxTokens value to cap the calculatedOutputBudget.

Example code:

// Before sending the summarization request, cap to model's output limit
const effectiveMaxTokens = Math.min(
  calculatedOutputBudget,
  modelEntry.maxTokens ?? 128_000  // fallback to safe default
);

// Use the effectiveMaxTokens value in the API request
const summarizationRequest = {
  // ... other request properties ...
  max_tokens: effectiveMaxTokens,
};

Alternatively, consider implementing a chunking approach to handle summaries that exceed the model's output limit:

// Chunk the summarization into multiple passes
const chunkSize = modelEntry.maxTokens ?? 128_000;
const chunks = [];
for (let i = 0; i < calculatedOutputBudget; i += chunkSize) {
  const chunk = {
    // ... chunk properties ...
    max_tokens: Math.min(chunkSize, calculatedOutputBudget - i),
  };
  chunks.push(chunk);
}

// Send each chunk as a separate API request
chunks.forEach((chunk) => {
  // Send the request
});

Verification

To verify the fix, test the compaction summarization with the updated code and ensure that:

The max_tokens value is capped at the model's output limit (128K for Anthropic).
The summarization request is successful and returns a valid response.
The chunking approach (if implemented) handles summaries that exceed the model's output limit correctly.

Extra Tips

Consider adding logging or monitoring to track the number of times the max_tokens value is capped or chunking is used.
Review the model registry to ensure that the maxTokens value is correctly set for each model.
Test the fix with different models and context window sizes to ensure that the issue is fully resolved.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #permission error #memory optimization #batch processing #GPU compatibility

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix Compaction fails with 1M context: max_tokens 240000 > 128000 for Anthropic models [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Workaround

PR fix notes

PR #54392: fix: clamp compaction max_tokens to model output limit

Description (problem / solution / changelog)

Summary

Root Cause

Fix

Impact

Testing

Changed files

Code Example

Bug Report

Summary

Environment

Steps to Reproduce

Root Cause Analysis

Expected Behavior

Proposed Fix

Impact

Workaround

Labels

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING