openclaw - 💡(How to fix) Fix [Bug]: "embedded run done" marks profile good after failure, wiping just-set cooldown

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

The embedded run "done" branch in pi-embedded-DWASRjxE.js unconditionally calls markAuthProfileGood and markAuthProfileUsed for the most-recent auth profile, even when the last assistant message ended with stopReason: "error" and a cooldown was just recorded for that profile by maybeMarkAuthProfileFailure. Because resetUsageStats (in order-CoOjbg-h.js) wipes cooldownUntil, cooldownReason, cooldownModel, failureCounts, and resets errorCount to 0, the freshly-set cooldown is destroyed roughly hundreds of ms after it was created. The profile keeps its lastFailureAt value (preserved by the object spread in resetUsageStats), but rotation never advances away from it because there is no longer any active cooldown to detect.

For setups that rely on multiple auth profiles in order to rotate across rate-limited accounts, this silently disables the rotation feature for any failure path that reaches the "done" branch.

Error Message

The embedded run "done" branch in pi-embedded-DWASRjxE.js unconditionally calls markAuthProfileGood and markAuthProfileUsed for the most-recent auth profile, even when the last assistant message ended with stopReason: "error" and a cooldown was just recorded for that profile by maybeMarkAuthProfileFailure. Because resetUsageStats (in order-CoOjbg-h.js) wipes cooldownUntil, cooldownReason, cooldownModel, failureCounts, and resets errorCount to 0, the freshly-set cooldown is destroyed roughly hundreds of ms after it was created. The profile keeps its lastFailureAt value (preserved by the object spread in resetUsageStats), but rotation never advances away from it because there is no longer any active cooldown to detect. "stopReason": "error", The phrase "usage limit" is already in ERROR_PATTERNS.rateLimit inside errors-By_fjUFz.js, and a quick check confirms classifyFailoverReason(raw, { provider: "openai-codex" }) returns "rate_limit" for the exact error string above. So maybeMarkAuthProfileFailure correctly writes: There is no check that lastAssistant?.stopReason !== "error" or that the run produced any payload before declaring the profile "good". The earlier "incomplete turn" branch (around line 36446) does call maybeMarkAuthProfileFailure and returns early, but only when incompleteTurnText is truthy. An assistant message with empty content and stopReason: "error" does not necessarily produce incompleteTurnText, so control falls through to the "done" branch.

  • if (lastProfileId && lastAssistant?.stopReason !== "error" && !aborted) {

Root Cause

dist/pi-embedded-DWASRjxE.js around line 36472:

log$16.debug(`embedded run done: runId=${params.runId} sessionId=${params.sessionId} durationMs=${...} aborted=${aborted}`);
if (lastProfileId) {
    await markAuthProfileGood({ store: authStore, provider, profileId: lastProfileId, agentDir: params.agentDir });
    await markAuthProfileUsed({ store: authStore, profileId: lastProfileId, agentDir: params.agentDir });
}

There is no check that lastAssistant?.stopReason !== "error" or that the run produced any payload before declaring the profile "good".

The earlier "incomplete turn" branch (around line 36446) does call maybeMarkAuthProfileFailure and returns early, but only when incompleteTurnText is truthy. An assistant message with empty content and stopReason: "error" does not necessarily produce incompleteTurnText, so control falls through to the "done" branch.

Fix Action

Fix / Workaround

Both lastAssistant and aborted are already in scope: the immediately following return { meta: { ..., aborted, stopReason: ... : lastAssistant?.stopReason } } block already reads them. The patch only skips the success-marking. The failure-marking still runs upstream as before, so legitimate successes are unaffected and legitimate failures now retain their cooldown.

Workaround on current release

Code Example

{
  "role": "assistant",
  "content": [],
  "stopReason": "error",
  "errorMessage": "You have hit your ChatGPT usage limit (free plan). Try again in ~N min.",
  "usage": { "input": 0, "output": 0, "totalTokens": 0 }
}

---

errorCount: 1
cooldownUntil: <T0 + backoff>
cooldownReason: "rate_limit"
cooldownModel: "gpt-5.3-codex"
failureCounts: { rate_limit: 1 }
lastFailureAt: T0

---

errorCount: 0
lastUsed: T0 + ~665ms
lastFailureAt: T0
(no cooldownUntil, no cooldownReason, no cooldownModel, no failureCounts)

---

log$16.debug(`embedded run done: runId=${params.runId} sessionId=${params.sessionId} durationMs=${...} aborted=${aborted}`);
if (lastProfileId) {
    await markAuthProfileGood({ store: authStore, provider, profileId: lastProfileId, agentDir: params.agentDir });
    await markAuthProfileUsed({ store: authStore, profileId: lastProfileId, agentDir: params.agentDir });
}

---

-    if (lastProfileId) {
+    if (lastProfileId && lastAssistant?.stopReason !== "error" && !aborted) {
         await markAuthProfileGood({ ... });
         await markAuthProfileUsed({ ... });
     }

---

{
     "errorCount": 1,
     "cooldownUntil": <now + 4h ms>,
     "cooldownReason": "rate_limit",
     "cooldownModel": "<model id>",
     "failureCounts": { "rate_limit": 1 }
   }
RAW_BUFFERClick to expand / collapse

Summary

The embedded run "done" branch in pi-embedded-DWASRjxE.js unconditionally calls markAuthProfileGood and markAuthProfileUsed for the most-recent auth profile, even when the last assistant message ended with stopReason: "error" and a cooldown was just recorded for that profile by maybeMarkAuthProfileFailure. Because resetUsageStats (in order-CoOjbg-h.js) wipes cooldownUntil, cooldownReason, cooldownModel, failureCounts, and resets errorCount to 0, the freshly-set cooldown is destroyed roughly hundreds of ms after it was created. The profile keeps its lastFailureAt value (preserved by the object spread in resetUsageStats), but rotation never advances away from it because there is no longer any active cooldown to detect.

For setups that rely on multiple auth profiles in order to rotate across rate-limited accounts, this silently disables the rotation feature for any failure path that reaches the "done" branch.

Reproduction

openclaw 2026.4.5 (also seen on 2026.4.2). Provider openai-codex, model gpt-5.3-codex. Upstream returns an assistant message of the form:

{
  "role": "assistant",
  "content": [],
  "stopReason": "error",
  "errorMessage": "You have hit your ChatGPT usage limit (free plan). Try again in ~N min.",
  "usage": { "input": 0, "output": 0, "totalTokens": 0 }
}

The phrase "usage limit" is already in ERROR_PATTERNS.rateLimit inside errors-By_fjUFz.js, and a quick check confirms classifyFailoverReason(raw, { provider: "openai-codex" }) returns "rate_limit" for the exact error string above. So maybeMarkAuthProfileFailure correctly writes:

errorCount: 1
cooldownUntil: <T0 + backoff>
cooldownReason: "rate_limit"
cooldownModel: "gpt-5.3-codex"
failureCounts: { rate_limit: 1 }
lastFailureAt: T0

Roughly 600 to 700 ms later the same run reaches the "embedded run done" path and calls markAuthProfileGood followed by markAuthProfileUsed. Both routes through resetUsageStats({ ...existing, errorCount: 0, cooldownUntil: void 0, cooldownReason: void 0, cooldownModel: void 0, failureCounts: void 0, ..., lastUsed: Date.now() }). The final stored entry becomes:

errorCount: 0
lastUsed: T0 + ~665ms
lastFailureAt: T0
(no cooldownUntil, no cooldownReason, no cooldownModel, no failureCounts)

lastUsed > lastFailureAt, no active cooldown, so on the next request the same broken profile is picked again.

A profile in the same order list that legitimately failed in an earlier session (where the run did NOT reach the "done" branch) ended up with the expected errorCount: 1 + cooldownReason: "rate_limit" + cooldownUntil shape, confirming markAuthProfileFailure itself works. The bug is the success-marking running after a failed run.

Root cause

dist/pi-embedded-DWASRjxE.js around line 36472:

log$16.debug(`embedded run done: runId=${params.runId} sessionId=${params.sessionId} durationMs=${...} aborted=${aborted}`);
if (lastProfileId) {
    await markAuthProfileGood({ store: authStore, provider, profileId: lastProfileId, agentDir: params.agentDir });
    await markAuthProfileUsed({ store: authStore, profileId: lastProfileId, agentDir: params.agentDir });
}

There is no check that lastAssistant?.stopReason !== "error" or that the run produced any payload before declaring the profile "good".

The earlier "incomplete turn" branch (around line 36446) does call maybeMarkAuthProfileFailure and returns early, but only when incompleteTurnText is truthy. An assistant message with empty content and stopReason: "error" does not necessarily produce incompleteTurnText, so control falls through to the "done" branch.

Proposed fix

-    if (lastProfileId) {
+    if (lastProfileId && lastAssistant?.stopReason !== "error" && !aborted) {
         await markAuthProfileGood({ ... });
         await markAuthProfileUsed({ ... });
     }

Both lastAssistant and aborted are already in scope: the immediately following return { meta: { ..., aborted, stopReason: ... : lastAssistant?.stopReason } } block already reads them. The patch only skips the success-marking. The failure-marking still runs upstream as before, so legitimate successes are unaffected and legitimate failures now retain their cooldown.

Tested locally against the openai-codex free-plan-limit case. Rotation correctly advances to the next profile in order on the very next request after the cooldown is preserved.

Why this matters

lastFailureAt is set, so the profile looks recently failed, but errorCount is 0 and no cooldown exists, so resolveAuthProfileEligibility and getSoonestCooldownExpiry treat the profile as healthy and immediately reuse it. There is no log line or visible signal that a cooldown was created and then wiped. Users report this as "fallback / rotation does not work" with no obvious diagnostic.

Workaround on current release

For each affected profile, manually edit agents/<agent>/agent/auth-profiles.json:

  1. Set lastGood[<provider>] to the next profile id in order.

  2. Add a synthetic cooldown to usageStats[<broken-profile-id>]:

    {
      "errorCount": 1,
      "cooldownUntil": <now + 4h ms>,
      "cooldownReason": "rate_limit",
      "cooldownModel": "<model id>",
      "failureCounts": { "rate_limit": 1 }
    }

This restores rotation immediately. The underlying bug still recurs on the next account-level failure until the source fix lands.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING