1. **Auth invalidation should cascade.** When the primary provider returns a `before_message_stream_start` failure carrying an auth-invalidation signature (or any non-retryable 4xx), advance to the next entry in the configured fallback chain rather than surfacing a minimal placeholder reply. 2. **Tool-call dedup across provider switch.** When a user manually switches providers (or the auto-fallback advances) while a prior run is still in shutdown, the new run should not re-execute pending tool calls from the prior run. Either drain pending calls before switching, or scope tool-call IDs to the run that issued them. 3. **Warm-bootstrap new conversations.** When a new `conversation_id` is created on an existing `session_key`, preload at least the last N message summaries from the most recently archived conversation on the same `session_key`. Cold-cache amnesia on a familiar channel is a much worse default than slightly slower bootstrap. 4. **Visible failure mode.** A `before_message_stream_start` failure should at minimum surface to the channel with a short user-facing notice (e.g. "Provider auth failed; cascading to fallback…") rather than the silent placeholder.

openclaw - 💡(How to fix) Fix Cascading failure: invalidated OAuth on primary provider produces empty placeholder reply; provider switch causes duplicate tool execution; cold-cache bootstrap on conversation rollover loses recent context

StepCodex · 2026-05-10T01:08:54Z

[openclaw] Three distinct failure modes compounded into a user-visible cascade in OpenClaw 2026.5.7 gitSha b8fe34a tonight. Filing them together because they h… Three distinct failure modes compounded into a user-visible cascade in OpenClaw `2026.5.7` (gitSha `b8fe34a`) tonight. Filing them together because they happened in sequence and the resolution path likely overlaps. 1. **Auth invalidation on the primary provider produces a minimal placeholder reply instead of cascading to the configured fallback chain.** The user saw three consecutive replies of "No extra notes from me." on Telegram while the trajectory recorded `[assistant turn failed before producing content]` and `stop=error`. 2. **After a manual provider switch, an abort/restart race causes duplicate tool execution** — identical tool results appear at the same millisecond timestamp in the trajectory, and surface to the user as repeated/duplicated bot output. 3. **Conversation rollover bootstraps `cold-cache-catchup`** without preloading any prior-conversation summaries on the same `session_key`, so the agent appears to forget recent prior work on the same channel. ## Summary Three distinct failure modes compounded into a user-visible cascade in OpenClaw `2026.5.7` (gitSha `b8fe34a`) tonight. Filing them together because they happened in sequence and the resolution path likely overlaps. 1. **Auth invalidation on the primary provider produces a minimal placeholder reply instead of cascading to the configured fallback chain.** The user saw three consecutive replies of "No extra notes from me." on Telegram while the trajectory recorded `[assistant turn failed before producing content]` and `stop=error`. 2. **After a manual provider switch, an abort/restart race causes duplicate tool execution** — identical tool results appear at the same millisecond timestamp in the trajectory, and surface to the user as repeated/duplicated bot output. 3. **Conversation rollover bootstraps `cold-cache-catchup`** without preloading any prior-conversation summaries on the same `session_key`, so the agent appears to forget recent prior work on the same channel. ## Environment - OpenClaw: `2026.5.7` (`gitSha: b8fe34a`) - Node: `v22.22.2` - Platform: `linux 6.6.114.1-microsoft-standard-WSL2 (x64)` (WSL on Windows 10) - Primary configured model: `openai-codex/gpt-5.5` - Configured fallback chain (order): 1. `claude-cli/claude-opus-4-7` 2. `openrouter/deepseek/deepseek-v4-pro` 3. `openrouter/moonshotai/kimi-k2.6` 4. `openrouter/x-ai/grok-4.3` 5. `openrouter/auto` 6. `openrouter/free` - Channels affected: webchat (`agent:main:main`) and Telegram direct (`agent:main:telegram:direct: `). ## Failure 1 — auth invalidation produces empty reply, no automatic fallback A user-initiated turn on the primary provider failed at `before_message_stream_start` with: ``` provider_transport_failure: Received network error or non-101 status code ``` Underlying provider message extracted from trajectory diagnostics: ``` "Your authentication token has been invalidated. Please try signing in again." ``` The trajectory recorded `[assistant turn failed before producing content]` with `stop=error`. The configured fallback chain did **not** activate automatically. The user had to manually switch to `openrouter/deepseek/deepseek-v4-pro`. During the dead-auth period, three consecutive turns produced minimal placeholder replies on the user's Telegram channel — they appear to surface to the channel as a short fallback string (in this case "No extra notes from me.") rather than the full diagnostic `[assistant turn failed before producing content]` text. This is hostile UX because the user has no signal anything is wrong. ## Failure 2 — duplicate tool execution after manual provider switch Immediately after the manual switch to DeepSeek, OpenClaw emitted: ``` openclaw:prompt-error: "This operation was aborted | This operation was aborted" ``` (Note the literal ` | ` in the error string — appears to be two abort signals concatenated into one message.) The next assistant message: `⚠️ Previous run is still shutting down. Please try again in a moment.` Subsequent tool executions duplicated. Concrete examples from the trajectory (timestamps in UTC, all from the same session): - `00:30:13.686Z` — two identical `{ "tool": "read", "error": "Aborted" }` records. - `00:31:29.274Z`/`.275Z` — two identical visual-format-spec read results. - `00:32:30.078Z` — two identical `git log` outputs. - `00:50:41.040Z` — three duplicate sets of nine identical tool results in one cluster. Mechanism appears to be a race between (a) cleanup of the failed Codex run, which still had pending tool calls in shutdown, and (b) the new DeepSeek run firing the same tool calls against the same execution slots. Both paths' results land in the trajectory. ## Failure 3 — cold-cache bootstrap on rollover loses recent context During the incident both channels rolled over to fresh `conversation_id`s: | session_key | prior conv | archived | new

Three distinct failure modes compounded into a user-visible cascade in OpenClaw 2026.5.7 (gitSha b8fe34a) tonight. Filing them together because they happened in sequence and the resolution path likely overlaps.

Auth invalidation on the primary provider produces a minimal placeholder reply instead of cascading to the configured fallback chain. The user saw three consecutive replies of "No extra notes from me." on Telegram while the trajectory recorded [assistant turn failed before producing content] and stop=error.
After a manual provider switch, an abort/restart race causes duplicate tool execution — identical tool results appear at the same millisecond timestamp in the trajectory, and surface to the user as repeated/duplicated bot output.
Conversation rollover bootstraps cold-cache-catchup without preloading any prior-conversation summaries on the same session_key, so the agent appears to forget recent prior work on the same channel.

provider_transport_failure: Received network error or non-101 status code --- "Your authentication token has been invalidated. Please try signing in again." --- openclaw:prompt-error: "This operation was aborted | This operation was aborted"

Summary

Auth invalidation on the primary provider produces a minimal placeholder reply instead of cascading to the configured fallback chain. The user saw three consecutive replies of "No extra notes from me." on Telegram while the trajectory recorded [assistant turn failed before producing content] and stop=error.
After a manual provider switch, an abort/restart race causes duplicate tool execution — identical tool results appear at the same millisecond timestamp in the trajectory, and surface to the user as repeated/duplicated bot output.
Conversation rollover bootstraps cold-cache-catchup without preloading any prior-conversation summaries on the same session_key, so the agent appears to forget recent prior work on the same channel.

Environment

OpenClaw: 2026.5.7 (gitSha: b8fe34a)
Node: v22.22.2
Platform: linux 6.6.114.1-microsoft-standard-WSL2 (x64) (WSL on Windows 10)
Primary configured model: openai-codex/gpt-5.5
Configured fallback chain (order):
1. claude-cli/claude-opus-4-7
2. openrouter/deepseek/deepseek-v4-pro
3. openrouter/moonshotai/kimi-k2.6
4. openrouter/x-ai/grok-4.3
5. openrouter/auto
6. openrouter/free
Channels affected: webchat (agent:main:main) and Telegram direct (agent:main:telegram:direct:<id>).

Failure 1 — auth invalidation produces empty reply, no automatic fallback

A user-initiated turn on the primary provider failed at before_message_stream_start with:

provider_transport_failure: Received network error or non-101 status code

Underlying provider message extracted from trajectory diagnostics:

"Your authentication token has been invalidated. Please try signing in again."

The trajectory recorded [assistant turn failed before producing content] with stop=error. The configured fallback chain did not activate automatically. The user had to manually switch to openrouter/deepseek/deepseek-v4-pro.

During the dead-auth period, three consecutive turns produced minimal placeholder replies on the user's Telegram channel — they appear to surface to the channel as a short fallback string (in this case "No extra notes from me.") rather than the full diagnostic [assistant turn failed before producing content] text. This is hostile UX because the user has no signal anything is wrong.

Failure 2 — duplicate tool execution after manual provider switch

Immediately after the manual switch to DeepSeek, OpenClaw emitted:

openclaw:prompt-error: "This operation was aborted | This operation was aborted"

(Note the literal | in the error string — appears to be two abort signals concatenated into one message.)

The next assistant message: ⚠️ Previous run is still shutting down. Please try again in a moment.

Subsequent tool executions duplicated. Concrete examples from the trajectory (timestamps in UTC, all from the same session):

00:30:13.686Z — two identical { "tool": "read", "error": "Aborted" } records.
00:31:29.274Z/.275Z — two identical visual-format-spec read results.
00:32:30.078Z — two identical git log outputs.
00:50:41.040Z — three duplicate sets of nine identical tool results in one cluster.

Mechanism appears to be a race between (a) cleanup of the failed Codex run, which still had pending tool calls in shutdown, and (b) the new DeepSeek run firing the same tool calls against the same execution slots. Both paths' results land in the trajectory.

Failure 3 — cold-cache bootstrap on rollover loses recent context

During the incident both channels rolled over to fresh conversation_ids:

session_key	prior conv	archived	new conv	created
`agent:main:main`	780	`2026-05-09 23:27 UTC`	822	`2026-05-09 23:28 UTC`
`agent:main:telegram:direct:<id>`	674	`2026-05-09 23:57 UTC`	825	`2026-05-09 23:58 UTC`

conversation_compaction_maintenance records the new conversations bootstrapping with kind = cold-cache-catchup. The lossless-claw memory plugin can recall via lcm_grep / lcm_expand_query on demand, but the model has to call those tools — and on the very first turn, failure 1 prevented any tool calls from running. The user perceived this as the assistant having forgotten substantive earlier-day work on the same Telegram channel (the prior conversation 674 had real content from ~11h earlier).

Expected behavior

Auth invalidation should cascade. When the primary provider returns a before_message_stream_start failure carrying an auth-invalidation signature (or any non-retryable 4xx), advance to the next entry in the configured fallback chain rather than surfacing a minimal placeholder reply.
Tool-call dedup across provider switch. When a user manually switches providers (or the auto-fallback advances) while a prior run is still in shutdown, the new run should not re-execute pending tool calls from the prior run. Either drain pending calls before switching, or scope tool-call IDs to the run that issued them.
Warm-bootstrap new conversations. When a new conversation_id is created on an existing session_key, preload at least the last N message summaries from the most recently archived conversation on the same session_key. Cold-cache amnesia on a familiar channel is a much worse default than slightly slower bootstrap.
Visible failure mode. A before_message_stream_start failure should at minimum surface to the channel with a short user-facing notice (e.g. "Provider auth failed; cascading to fallback…") rather than the silent placeholder.

Notes

I have full per-session JSONL trajectories with the exact timestamps, message IDs, and error envelopes captured. Happy to share via private channel if maintainers want them.
This was filed jointly with a related issue against NousResearch/hermes-agent covering the upstream OAuth-token-burning behavior that triggered failure 1 in this report — the auth-cascade and dedup behavior here is independently a bug regardless of the auth root cause.

FAQ

Expected behavior

Auth invalidation should cascade. When the primary provider returns a before_message_stream_start failure carrying an auth-invalidation signature (or any non-retryable 4xx), advance to the next entry in the configured fallback chain rather than surfacing a minimal placeholder reply.
Tool-call dedup across provider switch. When a user manually switches providers (or the auto-fallback advances) while a prior run is still in shutdown, the new run should not re-execute pending tool calls from the prior run. Either drain pending calls before switching, or scope tool-call IDs to the run that issued them.
Warm-bootstrap new conversations. When a new conversation_id is created on an existing session_key, preload at least the last N message summaries from the most recently archived conversation on the same session_key. Cold-cache amnesia on a familiar channel is a much worse default than slightly slower bootstrap.
Visible failure mode. A before_message_stream_start failure should at minimum surface to the channel with a short user-facing notice (e.g. "Provider auth failed; cascading to fallback…") rather than the silent placeholder.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix Cascading failure: invalidated OAuth on primary provider produces empty placeholder reply; provider switch causes duplicate tool execution; cold-cache bootstrap on conversation rollover loses recent context

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Code Example

Summary

Environment

Failure 1 — auth invalidation produces empty reply, no automatic fallback

Failure 2 — duplicate tool execution after manual provider switch

Failure 3 — cold-cache bootstrap on rollover loses recent context

Expected behavior

Notes

FAQ

Expected behavior

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix Cascading failure: invalidated OAuth on primary provider produces empty placeholder reply; provider switch causes duplicate tool execution; cold-cache bootstrap on conversation rollover loses recent context

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Code Example

Summary

Environment

Failure 1 — auth invalidation produces empty reply, no automatic fallback

Failure 2 — duplicate tool execution after manual provider switch

Failure 3 — cold-cache bootstrap on rollover loses recent context

Expected behavior

Notes

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING