openclaw - 💡(How to fix) Fix Cascading failure: invalidated OAuth on primary provider produces empty placeholder reply; provider switch causes duplicate tool execution; cold-cache bootstrap on conversation rollover loses recent context

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Three distinct failure modes compounded into a user-visible cascade in OpenClaw 2026.5.7 (gitSha b8fe34a) tonight. Filing them together because they happened in sequence and the resolution path likely overlaps.

  1. Auth invalidation on the primary provider produces a minimal placeholder reply instead of cascading to the configured fallback chain. The user saw three consecutive replies of "No extra notes from me." on Telegram while the trajectory recorded [assistant turn failed before producing content] and stop=error.
  2. After a manual provider switch, an abort/restart race causes duplicate tool execution — identical tool results appear at the same millisecond timestamp in the trajectory, and surface to the user as repeated/duplicated bot output.
  3. Conversation rollover bootstraps cold-cache-catchup without preloading any prior-conversation summaries on the same session_key, so the agent appears to forget recent prior work on the same channel.

Error Message

openclaw:prompt-error: "This operation was aborted | This operation was aborted"

Root Cause

Three distinct failure modes compounded into a user-visible cascade in OpenClaw 2026.5.7 (gitSha b8fe34a) tonight. Filing them together because they happened in sequence and the resolution path likely overlaps.

Code Example

provider_transport_failure: Received network error or non-101 status code

---

"Your authentication token has been invalidated. Please try signing in again."

---

openclaw:prompt-error: "This operation was aborted | This operation was aborted"
RAW_BUFFERClick to expand / collapse

Summary

Three distinct failure modes compounded into a user-visible cascade in OpenClaw 2026.5.7 (gitSha b8fe34a) tonight. Filing them together because they happened in sequence and the resolution path likely overlaps.

  1. Auth invalidation on the primary provider produces a minimal placeholder reply instead of cascading to the configured fallback chain. The user saw three consecutive replies of "No extra notes from me." on Telegram while the trajectory recorded [assistant turn failed before producing content] and stop=error.
  2. After a manual provider switch, an abort/restart race causes duplicate tool execution — identical tool results appear at the same millisecond timestamp in the trajectory, and surface to the user as repeated/duplicated bot output.
  3. Conversation rollover bootstraps cold-cache-catchup without preloading any prior-conversation summaries on the same session_key, so the agent appears to forget recent prior work on the same channel.

Environment

  • OpenClaw: 2026.5.7 (gitSha: b8fe34a)
  • Node: v22.22.2
  • Platform: linux 6.6.114.1-microsoft-standard-WSL2 (x64) (WSL on Windows 10)
  • Primary configured model: openai-codex/gpt-5.5
  • Configured fallback chain (order):
    1. claude-cli/claude-opus-4-7
    2. openrouter/deepseek/deepseek-v4-pro
    3. openrouter/moonshotai/kimi-k2.6
    4. openrouter/x-ai/grok-4.3
    5. openrouter/auto
    6. openrouter/free
  • Channels affected: webchat (agent:main:main) and Telegram direct (agent:main:telegram:direct:<id>).

Failure 1 — auth invalidation produces empty reply, no automatic fallback

A user-initiated turn on the primary provider failed at before_message_stream_start with:

provider_transport_failure: Received network error or non-101 status code

Underlying provider message extracted from trajectory diagnostics:

"Your authentication token has been invalidated. Please try signing in again."

The trajectory recorded [assistant turn failed before producing content] with stop=error. The configured fallback chain did not activate automatically. The user had to manually switch to openrouter/deepseek/deepseek-v4-pro.

During the dead-auth period, three consecutive turns produced minimal placeholder replies on the user's Telegram channel — they appear to surface to the channel as a short fallback string (in this case "No extra notes from me.") rather than the full diagnostic [assistant turn failed before producing content] text. This is hostile UX because the user has no signal anything is wrong.

Failure 2 — duplicate tool execution after manual provider switch

Immediately after the manual switch to DeepSeek, OpenClaw emitted:

openclaw:prompt-error: "This operation was aborted | This operation was aborted"

(Note the literal | in the error string — appears to be two abort signals concatenated into one message.)

The next assistant message: ⚠️ Previous run is still shutting down. Please try again in a moment.

Subsequent tool executions duplicated. Concrete examples from the trajectory (timestamps in UTC, all from the same session):

  • 00:30:13.686Z — two identical { "tool": "read", "error": "Aborted" } records.
  • 00:31:29.274Z/.275Z — two identical visual-format-spec read results.
  • 00:32:30.078Z — two identical git log outputs.
  • 00:50:41.040Z — three duplicate sets of nine identical tool results in one cluster.

Mechanism appears to be a race between (a) cleanup of the failed Codex run, which still had pending tool calls in shutdown, and (b) the new DeepSeek run firing the same tool calls against the same execution slots. Both paths' results land in the trajectory.

Failure 3 — cold-cache bootstrap on rollover loses recent context

During the incident both channels rolled over to fresh conversation_ids:

session_keyprior convarchivednew convcreated
agent:main:main7802026-05-09 23:27 UTC8222026-05-09 23:28 UTC
agent:main:telegram:direct:<id>6742026-05-09 23:57 UTC8252026-05-09 23:58 UTC

conversation_compaction_maintenance records the new conversations bootstrapping with kind = cold-cache-catchup. The lossless-claw memory plugin can recall via lcm_grep / lcm_expand_query on demand, but the model has to call those tools — and on the very first turn, failure 1 prevented any tool calls from running. The user perceived this as the assistant having forgotten substantive earlier-day work on the same Telegram channel (the prior conversation 674 had real content from ~11h earlier).

Expected behavior

  1. Auth invalidation should cascade. When the primary provider returns a before_message_stream_start failure carrying an auth-invalidation signature (or any non-retryable 4xx), advance to the next entry in the configured fallback chain rather than surfacing a minimal placeholder reply.
  2. Tool-call dedup across provider switch. When a user manually switches providers (or the auto-fallback advances) while a prior run is still in shutdown, the new run should not re-execute pending tool calls from the prior run. Either drain pending calls before switching, or scope tool-call IDs to the run that issued them.
  3. Warm-bootstrap new conversations. When a new conversation_id is created on an existing session_key, preload at least the last N message summaries from the most recently archived conversation on the same session_key. Cold-cache amnesia on a familiar channel is a much worse default than slightly slower bootstrap.
  4. Visible failure mode. A before_message_stream_start failure should at minimum surface to the channel with a short user-facing notice (e.g. "Provider auth failed; cascading to fallback…") rather than the silent placeholder.

Notes

  • I have full per-session JSONL trajectories with the exact timestamps, message IDs, and error envelopes captured. Happy to share via private channel if maintainers want them.
  • This was filed jointly with a related issue against NousResearch/hermes-agent covering the upstream OAuth-token-burning behavior that triggered failure 1 in this report — the auth-cascade and dedup behavior here is independently a bug regardless of the auth root cause.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

  1. Auth invalidation should cascade. When the primary provider returns a before_message_stream_start failure carrying an auth-invalidation signature (or any non-retryable 4xx), advance to the next entry in the configured fallback chain rather than surfacing a minimal placeholder reply.
  2. Tool-call dedup across provider switch. When a user manually switches providers (or the auto-fallback advances) while a prior run is still in shutdown, the new run should not re-execute pending tool calls from the prior run. Either drain pending calls before switching, or scope tool-call IDs to the run that issued them.
  3. Warm-bootstrap new conversations. When a new conversation_id is created on an existing session_key, preload at least the last N message summaries from the most recently archived conversation on the same session_key. Cold-cache amnesia on a familiar channel is a much worse default than slightly slower bootstrap.
  4. Visible failure mode. A before_message_stream_start failure should at minimum surface to the channel with a short user-facing notice (e.g. "Provider auth failed; cascading to fallback…") rather than the silent placeholder.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING