openclaw - 💡(How to fix) Fix Session ending with assistant message causes infinite prefill error loop with Opus 4.6 [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#58567Fetched 2026-04-08 02:00:59
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Author
Participants

Error Message

lane task error: lane=session:agent:xxx:telegram:group:yyy error="FailoverError: The AI service is temporarily overloaded..."

Root Cause

The session store (.jsonl) persists the assistant message as the last entry. On next load, OC sends this transcript to the model API. Anthropic Opus 4.6 requires the last message to be role: user — it does not support assistant prefill. Other models (Sonnet, GPT) handle this fine.

Fix Action

Workaround

We wrote a cron script that periodically scans session files and deletes any that end with role: assistant. This prevents the error loop but loses session context.

Code Example

lane task error: lane=session:agent:xxx:telegram:group:yyy 
error="FailoverError: The AI service is temporarily overloaded..."

---

// If last message is assistant and model doesn't support prefill,
// inject a synthetic user message to allow continuation
if (messages.length > 0 && messages[messages.length - 1].role === "assistant") {
  if (!modelCapabilities.supportsPrefill) {
    messages.push({
      role: "user",
      content: [{ type: "text", text: "(session resumed)" }]
    });
  }
}
RAW_BUFFERClick to expand / collapse

Bug Description

When a session transcript ends with an assistant role message (e.g., due to OAuth prompt, tool timeout, or aborted response), subsequent attempts to resume that session fail with Anthropic API error because Claude Opus 4.6 does not support assistant message prefill.

This creates an infinite error loop — every trigger (mention, DM, heartbeat) hits the same error, and the session never recovers.

Reproduction

  1. Configure an agent with anthropic/claude-opus-4-6 as primary model
  2. Trigger a session where the last stored message is role: assistant (e.g., OAuth authorization prompt that gets no user follow-up)
  3. Send a new message to that session
  4. Expected: Agent responds normally
  5. Actual: FailoverError — API rejects the request because messages end with assistant role

Error Log

lane task error: lane=session:agent:xxx:telegram:group:yyy 
error="FailoverError: The AI service is temporarily overloaded..."

The error cascades through the entire fallback chain (anthropic → codex → minimax → newapi), because the session transcript itself is malformed.

Root Cause

The session store (.jsonl) persists the assistant message as the last entry. On next load, OC sends this transcript to the model API. Anthropic Opus 4.6 requires the last message to be role: user — it does not support assistant prefill. Other models (Sonnet, GPT) handle this fine.

Proposed Fix

In the message preparation pipeline (before sending to model API), add a guard:

// If last message is assistant and model doesn't support prefill,
// inject a synthetic user message to allow continuation
if (messages.length > 0 && messages[messages.length - 1].role === "assistant") {
  if (!modelCapabilities.supportsPrefill) {
    messages.push({
      role: "user",
      content: [{ type: "text", text: "(session resumed)" }]
    });
  }
}

Alternatively, the session store could ensure transcripts never end with an assistant message by appending a sentinel on write.

Impact

  • Affects all agents using Opus 4.6 as primary model
  • In multi-agent setups (14 agents), this can cascade across all agents simultaneously
  • Recovery requires manual session file deletion
  • We found 28 affected sessions across 16 agents in a single Docker instance

Environment

  • OpenClaw: 2026.3.28
  • Model: anthropic/claude-opus-4-6
  • Setup: Multi-agent Docker deployment with Telegram + Feishu channels

Workaround

We wrote a cron script that periodically scans session files and deletes any that end with role: assistant. This prevents the error loop but loses session context.

extent analysis

TL;DR

Implement a guard in the message preparation pipeline to inject a synthetic user message when the last message is an assistant message and the model does not support prefill.

Guidance

  • Identify agents using Opus 4.6 as the primary model and prioritize updates for these agents.
  • Consider implementing the proposed fix in the message preparation pipeline to handle assistant messages.
  • Review the session store to ensure transcripts do not end with assistant messages, potentially by appending a sentinel on write.
  • As a temporary workaround, use the provided cron script to scan and delete affected session files, but be aware that this will lose session context.

Example

if (messages.length > 0 && messages[messages.length - 1].role === "assistant") {
  if (!modelCapabilities.supportsPrefill) {
    messages.push({
      role: "user",
      content: [{ type: "text", text: "(session resumed)" }]
    });
  }
}

Notes

The proposed fix assumes that injecting a synthetic user message will allow the session to continue without issues. However, this may not be suitable for all use cases, and additional testing is recommended.

Recommendation

Apply the proposed fix in the message preparation pipeline to handle assistant messages, as this approach is more targeted and less likely to result in lost session context compared to the cron script workaround.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING