openclaw - ✅(Solved) Fix Session repair leaves JSONL ending on assistant turn → triggers Anthropic 400 prefill loop [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#75271Fetched 2026-05-01 05:35:57
View on GitHub
Comments
1
Participants
2
Timeline
7
Reactions
1
Timeline (top)
referenced ×5commented ×1cross-referenced ×1

agent/embedded session-repair logic leaves session JSONL files ending on a role=assistant message after "repair", which then resubmits to Anthropic with assistant prefill — Anthropic rejects with HTTP 400 (This model does not support assistant message prefill). When the agent's failover chain has only one candidate, this surfaces as a user-visible error and can wedge the embedded agent in a silent loop.

Error Message

agent/embedded session-repair logic leaves session JSONL files ending on a role=assistant message after "repair", which then resubmits to Anthropic with assistant prefill — Anthropic rejects with HTTP 400 (This model does not support assistant message prefill). When the agent's failover chain has only one candidate, this surfaces as a user-visible error and can wedge the embedded agent in a silent loop. 4. Request submitted to Anthropic ends with assistant content → HTTP 400 format error.

Root Cause

Repro / Observed Behaviour

  1. Session JSONL ends with role=assistant (e.g. previous run interrupted before the next user turn was appended).
  2. agent/embedded attempts repair on session resume.
  3. Repair rewrites the assistant message but file still ends on role=assistant.
  4. Request submitted to Anthropic ends with assistant content → HTTP 400 format error.
  5. Failover decision = surface_error because primary is the only candidate.
  6. Loop continues until process is SIGTERM'd.

Fix Action

Fix / Workaround

Workaround (current)

  • Quarantined 250 corrupt sessions to ~/.openclaw/agents/<id>/sessions/_quarantine_YYYY-MM-DD/ with a MANIFEST.
  • Added fallbacks array to agents.defaults.model so single-candidate 4xx no longer surfaces directly.

PR fix notes

PR #75284: fix(agents): trim trailing assistant turns during session file repair (#75271)

Description (problem / solution / changelog)

Summary

Fixes #75271.

repairSessionFileIfNeeded did not handle sessions whose JSONL ends on a role=assistant entry. Anthropic (and Anthropic-compatible) APIs reject such transcripts with HTTP 400 "This model does not support assistant message prefill", producing a silent loop with no forward progress.

Root cause

Session files can end on an assistant turn when a previous run was interrupted after the assistant response was written but before the next user turn was appended. The existing repair logic handled empty-content assistant entries, blank user messages, and malformed JSON lines — but not the trailing-assistant constraint.

Fix

After parsing all JSONL lines into entries, pop trailing role=assistant message entries until the last entry is not an assistant message. The trimmed entries are counted as droppedTrailingAssistantMessages and included in the repair warn log.

Existing repair behaviors are unaffected: mid-transcript assistant entries (followed by at least one later user turn) are not touched.

Pre-implement audit

  1. Existing-helper check: No existing trimmer for trailing assistant entries — fix added directly to session-file-repair.ts. ✓
  2. Shared-helper caller check: repairSessionFileIfNeeded has two call sites (compact.ts:805, attempt.ts:1305) — both use the same {sessionFile, warn} params, no contract change. ✓
  3. Rival scan: No open PRs targeting session-file-repair.ts or the trailing-assistant issue. ✓

Tests

  • Updated "rewrites persisted assistant messages with empty content arrays" → now a mid-transcript test (poisoned assistant followed by user turn), since a session that ends on assistant should be trimmed.
  • Updated "is a no-op on a session that was already repaired" → healed assistant followed by user turn, confirming no second-pass changes.
  • Updated "does not rewrite silent-reply turns" → mid-transcript variant (followed by user), confirming mid-transcript entries are not dropped.
  • Added "drops trailing assistant turns to prevent Anthropic 400 prefill rejection (#75271)" — single trailing assistant entry dropped, warn message verified.
  • Added "drops multiple consecutive trailing assistant turns" — two consecutive trailing entries both dropped.

All 24 tests pass.

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • docs/reference/transcript-hygiene.md (modified, +6/-1)
  • src/agents/session-file-repair.test.ts (modified, +111/-20)
  • src/agents/session-file-repair.ts (modified, +48/-1)

Code Example

06:51:36  agent/embedded repair → session e9256c82-...
06:51:38  Anthropic 400: "This model does not support assistant message prefill"
06:51:38  failover decision: surface_error (1 candidate)
06:51:5306:53:21  90s silence, embedded lane stuck
06:53:21  SIGTERM (manual)
RAW_BUFFERClick to expand / collapse

Summary

agent/embedded session-repair logic leaves session JSONL files ending on a role=assistant message after "repair", which then resubmits to Anthropic with assistant prefill — Anthropic rejects with HTTP 400 (This model does not support assistant message prefill). When the agent's failover chain has only one candidate, this surfaces as a user-visible error and can wedge the embedded agent in a silent loop.

Environment

  • OpenClaw: latest (running via LaunchAgent on macOS)
  • Model: anthropic/claude-opus-4-7
  • Platform: Darwin 25.3.0 arm64, Node v25.8.1

Repro / Observed Behaviour

  1. Session JSONL ends with role=assistant (e.g. previous run interrupted before the next user turn was appended).
  2. agent/embedded attempts repair on session resume.
  3. Repair rewrites the assistant message but file still ends on role=assistant.
  4. Request submitted to Anthropic ends with assistant content → HTTP 400 format error.
  5. Failover decision = surface_error because primary is the only candidate.
  6. Loop continues until process is SIGTERM'd.

Real-world Impact

Out of 877 session files on this gateway, 250 (~28%) had this corruption. Yesterday alone: 300 prefill rejections + 49 repair attempts. Today before manual restart (~7h window): 84 + 14. Continuous, not one-off.

Suggested Fix

Either:

  • Drop trailing assistant entries during repair until the file ends on a role=user turn, OR
  • Append a synthetic user "(continue)" turn before resubmission.

Also worth considering: a session-janitor cron that quarantines any JSONL whose last message entry isn't role=user, run weekly or on startup.

Workaround (current)

  • Quarantined 250 corrupt sessions to ~/.openclaw/agents/<id>/sessions/_quarantine_YYYY-MM-DD/ with a MANIFEST.
  • Added fallbacks array to agents.defaults.model so single-candidate 4xx no longer surfaces directly.

Logs (sanitized)

06:51:36  agent/embedded repair → session e9256c82-...
06:51:38  Anthropic 400: "This model does not support assistant message prefill"
06:51:38  failover decision: surface_error (1 candidate)
06:51:53→06:53:21  90s silence, embedded lane stuck
06:53:21  SIGTERM (manual)

Happy to provide the quarantine MANIFEST or full logs offline if useful.

extent analysis

TL;DR

Drop trailing assistant entries during repair or append a synthetic user turn before resubmission to prevent Anthropic HTTP 400 errors.

Guidance

  • Identify and modify the agent/embedded session-repair logic to either drop trailing role=assistant messages or append a synthetic role=user turn.
  • Verify the fix by checking the session JSONL files for correct termination on a role=user message after repair.
  • Consider implementing a session-janitor cron to quarantine corrupted sessions and prevent future issues.
  • Review the fallbacks array in agents.defaults.model to ensure single-candidate 4xx errors are handled correctly.

Example

No explicit code example is provided, as the issue implies modifications to the existing agent/embedded logic.

Notes

The suggested fix assumes that modifying the session-repair logic will resolve the issue. However, additional testing and verification may be necessary to ensure the fix works as expected.

Recommendation

Apply the suggested fix to drop trailing assistant entries or append a synthetic user turn, as this directly addresses the root cause of the Anthropic HTTP 400 errors.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix Session repair leaves JSONL ending on assistant turn → triggers Anthropic 400 prefill loop [1 pull requests, 1 comments, 2 participants]