openclaw - ✅(Solved) Fix Embedded Pi agent enters compaction loop on repeated 400 errors with no response body (openai-completions API) [2 pull requests, 1 participants]

openclaw2026-04-14 09:36:49

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#66462•Fetched 2026-04-15 06:26:08

View on GitHub

Comments

Participants

Timeline

Reactions

Author

HongzhuLiu

Participants

HongzhuLiu

Timeline (top)

referenced ×2closed ×1cross-referenced ×1renamed ×1

Error Message

OpenClaw classifies this as a format error → triggers compaction → retries → hits the same 400 again → loop
Framework classifies as format error → triggers compaction → retries → loops

Error Log Pattern (repeated):

error="LLM request failed: provider rejected the request schema or tool payload. rawError=400 status code (no body)" 3. The 400 has no response body, so the failover system cannot classify the actual error — it defaults to format classification via failover-policy.ts: 4. Compaction loop: The format error triggers compaction, which retries with a modified conversation state, hits the same 400, and loops until the compaction safeguard intervenes. The 400 error from the proxy may be caused by:

1. Better error classification for 400 with no body

Suggestion: When message is empty or too short to classify, return null (unknown) instead of "format", so the failover decision can surface the error rather than entering a compaction loop. If compaction is triggered but the underlying error persists (same status code, same provider), the safeguard should activate sooner rather than looping multiple times.

Root Cause

The 400 error from the proxy may be caused by:

Fix Action

Workaround

Disable active-memory (enabled: false) or route the affected channel to a non-Claude model.

PR fix notes

PR #66473: fix: don't classify 400/422 with no body as format error

Repository: openclaw/openclaw
Author: HongzhuLiu
State: closed | merged: False
Link: https://github.com/openclaw/openclaw/pull/66473

Description (problem / solution / changelog)

Problem

When a provider behind a proxy returns 400 or 422 with no response body, the failover system defaults to "format" classification. This triggers a compaction loop:

400 no-body → classified as "format" → compaction → retry → 400 again → compaction → loop

See issue #66462 for the full error log pattern.

Fix

In src/agents/pi-embedded-helpers/errors.ts:

400/422 with no body → return null (unknown), don't default to "format"
400/422 with unclassifiable body → still return "format" (preserves existing behavior for actual schema errors)

This prevents the compaction loop while keeping the format error classification for cases where the provider actually returns a meaningful error message.

Changes

File	Change
`src/agents/pi-embedded-helpers/errors.ts`	Add empty-body check before defaulting to "format"
`src/agents/failover-error.test.ts`	Update test expectations for no-body 400/422

Tests

51 passed (failover-error.test.ts)
12 passed (failover-matches.test.ts)

Changed files

PR #67024: fix: don't classify 400/422 with no body as format error

Repository: openclaw/openclaw
Author: altaywtf
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/67024

Description (problem / solution / changelog)

Summary:

stop body-less HTTP 400/422 proxy failures from defaulting to "format"
keep the fix scoped to failover classification and tests
handle explicit wrapper-only shapes like 400 status code (no body)

Credit:

original fix and report path came from @HongzhuLiu in #66473; this PR is the cleaned, rebased maintainer replacement

Changes:

return null for empty or explicit no-body 400/422 wrappers in failover classification
update failover classifier regressions for raw and structured no-body shapes
keep the changelog note in Unreleased > Fixes

Validation:

pnpm test src/agents/failover-error.test.ts src/agents/pi-embedded-helpers.isbillingerrormessage.test.ts src/agents/pi-embedded-runner/run/failover-policy.test.ts

Linked Issues:

supersedes #66473
fixes #66462

Changed files

CHANGELOG.md (modified, +1/-0)
src/agents/failover-error.test.ts (modified, +97/-2)
src/agents/failover-error.ts (modified, +134/-31)
src/agents/pi-embedded-helpers.isbillingerrormessage.test.ts (modified, +20/-2)
src/agents/pi-embedded-helpers/errors.ts (modified, +35/-0)

Code Example

[agent/embedded] embedded run failover decision: runId=active-memory-mnxtzi9n-5b61fdfd 
  stage=assistant decision=surface_error reason=timeout 
  provider=custom-provider/claude-opus-4-6 profile=-

[agent/embedded] embedded run agent end: runId=13e4ecb9-bc1b-44d3-bb92-375feddf2bb5 
  isError=true model=claude-opus-4-6 provider=custom-provider 
  error="LLM request failed: provider rejected the request schema or tool payload. rawError=400 status code (no body)"

[agent/embedded] embedded run failover decision: runId=13e4ecb9-bc1b-44d3-bb92-375feddf2bb5 
  stage=assistant decision=surface_error reason=format provider=custom-provider/claude-opus-4-6 profile=-

[llm-idle-timeout] custom-provider/claude-opus-4-6 produced no reply before the idle watchdog; retrying same model

[compaction-safeguard] Compaction safeguard: no real conversation messages to summarize; 
  writing compaction boundary to suppress re-trigger loop.

---

curl -X POST "https://<proxy>/v1/chat/completions" \
     -H "Authorization: Bearer sk-xxx" \
     -d '{"model":"claude-opus-4-6","messages":[{"role":"user","content":"hi"}],"max_tokens":100}'

---

if (status === 400 || status === 422) {
     if (messageClassification) return messageClassification;
     return toReasonClassification("format");
   }

---

if (status === 400 || status === 422) {
  if (messageClassification) return messageClassification;
  return toReasonClassification("format");  // ← defaults to format even with no body
}

RAW_BUFFERClick to expand / collapse

Issue Description

When using a custom provider with api: "openai-completions" that proxies to Anthropic Claude (or other reasoning-capable models), the embedded Pi agent (used by active-memory, compaction, and other sub-agent flows) can enter a compaction loop when the provider returns a 400 status code with no response body.

This is a general issue affecting any openai-completions provider that sits behind a proxy or gateway, where:

The initial request times out or hits an idle timeout
The retry attempt receives a 400 with no body
OpenClaw classifies this as a format error → triggers compaction → retries → hits the same 400 again → loop

Reproduction Steps

Configure a provider with api: "openai-completions" pointing to a proxy/gateway that supports Claude or other reasoning-capable models
Enable active-memory plugin (default timeout 8000ms)
Send messages that trigger active-memory recall
The embedded Pi agent times out on the initial call, retries the same model, and receives 400 status code (no body) on retry
Framework classifies as format error → triggers compaction → retries → loops

Observed Behavior

Error Log Pattern (repeated):

[agent/embedded] embedded run failover decision: runId=active-memory-mnxtzi9n-5b61fdfd 
  stage=assistant decision=surface_error reason=timeout 
  provider=custom-provider/claude-opus-4-6 profile=-

[agent/embedded] embedded run agent end: runId=13e4ecb9-bc1b-44d3-bb92-375feddf2bb5 
  isError=true model=claude-opus-4-6 provider=custom-provider 
  error="LLM request failed: provider rejected the request schema or tool payload. rawError=400 status code (no body)"

[agent/embedded] embedded run failover decision: runId=13e4ecb9-bc1b-44d3-bb92-375feddf2bb5 
  stage=assistant decision=surface_error reason=format provider=custom-provider/claude-opus-4-6 profile=-

[llm-idle-timeout] custom-provider/claude-opus-4-6 produced no reply before the idle watchdog; retrying same model

[compaction-safeguard] Compaction safeguard: no real conversation messages to summarize; 
  writing compaction boundary to suppress re-trigger loop.

Key Observations:

Simple curl requests succeed (200 OK):

curl -X POST "https://<proxy>/v1/chat/completions" \
  -H "Authorization: Bearer sk-xxx" \
  -d '{"model":"claude-opus-4-6","messages":[{"role":"user","content":"hi"}],"max_tokens":100}'

Complex Pi agent payload fails on retry (400 with no body): The embedded Pi agent sends:
- Long system prompt (full agent bootstrap context, bootstrapContextMode: "lightweight")
- Tools array (memory_search, memory_get)
- thinking parameter (mapped from thinkLevel: "adaptive")
- stream: true, max_tokens: 32000

The 400 has no response body, so the failover system cannot classify the actual error — it defaults to format classification via failover-policy.ts:

if (status === 400 || status === 422) {
  if (messageClassification) return messageClassification;
  return toReasonClassification("format");
}

Compaction loop: The format error triggers compaction, which retries with a modified conversation state, hits the same 400, and loops until the compaction safeguard intervenes.

Possible Causes

The 400 error from the proxy may be caused by:

tools + thinking combination: When openai-completions format is used with a Claude model, the proxy may not correctly translate the combination of tools array + thinking parameter to the underlying Anthropic Messages API format.
Idle timeout retry payload corruption: The retry after idle timeout may replay a malformed conversation state — for example, thinkingSignature: "reasoning_text" blocks that Claude rejects on replay (noted in attempt.ts line 1137: "Anthropic Claude endpoints can reject replayed thinking blocks on any follow-up provider").
Large system prompt + tools limit: The embedded Pi agent lightweight bootstrap context includes a full system prompt. Combined with tools, this may exceed some proxy limit.

Environment

OpenClaw version: 2026.4.14-beta.1 (6823a6f)
OS: Darwin 25.3.0 (arm64)
Provider: Custom proxy with api: "openai-completions" → Claude Opus 4.6
Config: active-memory enabled, timeoutMs: 8000

Suggested Improvements

1. Better error classification for 400 with no body

When the provider returns 400 with no body, the failover system should NOT default to format classification, which triggers compaction. A 400 with no body could be a transient proxy issue, auth problem, or payload size limit — compaction is unlikely to help.

Current code (errors.ts ~line 610):

if (status === 400 || status === 422) {
  if (messageClassification) return messageClassification;
  return toReasonClassification("format");  // ← defaults to format even with no body
}

Suggestion: When message is empty or too short to classify, return null (unknown) instead of "format", so the failover decision can surface the error rather than entering a compaction loop.

2. Limit compaction retries on repeated identical errors

If compaction is triggered but the underlying error persists (same status code, same provider), the safeguard should activate sooner rather than looping multiple times.

3. Idle timeout retry should validate conversation state

Before retrying after idle timeout, validate that the conversation state is replayable — specifically check for incompatible thinkingSignature blocks that downstream providers may reject.

4. Documentation for `openai-completions` + Claude

Clarify which payload features are supported when using openai-completions API with Anthropic Claude models behind a proxy (tools, thinking, system prompt length limits, etc.).

Workaround

Disable active-memory (enabled: false) or route the affected channel to a non-Claude model.

extent analysis

TL;DR

Modify the error classification for 400 status codes with no response body to prevent defaulting to "format" errors, which trigger compaction loops.

Guidance

Update error classification: Change the errors.ts file to return null (unknown) instead of "format" when the provider returns a 400 status code with no response body.
Implement compaction retry limits: Introduce a limit on the number of compaction retries for repeated identical errors to prevent infinite loops.
Validate conversation state on retry: Before retrying after an idle timeout, validate the conversation state to ensure it's replayable and compatible with the downstream provider.
Review and adjust payload features: Verify which payload features are supported when using the openai-completions API with Anthropic Claude models behind a proxy.

Example

// Updated error classification in errors.ts
if (status === 400 || status === 422) {
  if (messageClassification) return messageClassification;
  if (!responseBody) return null; // Return null for 400 with no body
  return toReasonClassification("format");
}

Notes

The provided suggestions focus on addressing the immediate issue of compaction loops caused by 400 status codes with no response body. Further investigation into the root cause of the 400 errors (e.g., payload size limits, tool combinations, or thinking parameter issues) may be necessary to fully resolve the problem.

Recommendation

Apply the workaround by disabling active-memory or routing the affected channel to a non-Claude model until the suggested changes can be implemented and tested.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #agent setup #task chaining #parallel task

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix Embedded Pi agent enters compaction loop on repeated 400 errors with no response body (openai-completions API) [2 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Error Log Pattern (repeated):

1. Better error classification for 400 with no body

Root Cause

Fix Action

Workaround

PR fix notes

PR #66473: fix: don't classify 400/422 with no body as format error

Description (problem / solution / changelog)

Problem

Fix

Changes

Tests

Changed files

PR #67024: fix: don't classify 400/422 with no body as format error

Description (problem / solution / changelog)

Changed files

Code Example

Issue Description

Reproduction Steps

Observed Behavior

Error Log Pattern (repeated):

Key Observations:

Possible Causes

Environment

Suggested Improvements

1. Better error classification for 400 with no body

2. Limit compaction retries on repeated identical errors

3. Idle timeout retry should validate conversation state

4. Documentation for openai-completions + Claude

Workaround

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING

4. Documentation for `openai-completions` + Claude