openclaw - 💡(How to fix) Fix OpenClaw should auto-pause and auto-retry on provider 429/rate-limit errors [2 comments, 1 participants]

openclaw2026-04-07 14:15:58

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#62540•Fetched 2026-04-08 03:02:46

View on GitHub

Comments

Participants

Timeline

Reactions

Author

ncpe20012003

Participants

ncpe20012003

Timeline (top)

commented ×2

Error Message

provider error body
surface concise error including:

Root Cause

This would make OpenClaw feel much more resilient during real work. Right now, a brief TPM spike can make a session appear to “go stale” until the user nudges it again, even when the provider explicitly tells us when to retry.

Fix Action

Fix / Workaround

Failure behavior

After max retries:

mark run
surface concise error including:
- provider/model
- limit type if known
- retry attempts
- suggested mitigations:
  - reduce context
  - reduce concurrency
  - use smaller model
  - request higher limits

Optional follow-ons

pre-retry context shrinking
model fallback policy
per-provider concurrency throttling
projected TPM budgeting before dispatch
queueing large runs instead of firing immediately into known TPM exhaustion

RAW_BUFFERClick to expand / collapse

When a model call hits a provider rate limit, OpenClaw currently behaves like the run has effectively stopped until the user prompts again. This is frustrating during long-running work because the provider often includes an explicit retry delay, but OpenClaw does not automatically resume the run.

OpenClaw should treat 429/rate-limit failures as transient resumable pauses, not terminal run failures.

Problem

Current behavior:

model call hits 429 / TPM limit
provider returns retry hint like
run stops progressing
user must manually prompt again to continue

This creates a poor user experience, especially for:

long multi-step runs
coding/debugging sessions
background work
high-context GPT-5.x turns that temporarily exceed TPM

Desired behavior

On provider 429/rate-limit:

detect the rate-limit condition
parse retry delay from:
- provider error body
- structured message text
mark the run/session as paused for rate limit
schedule an automatic retry
resume the pending step automatically without needing a new user prompt

Important note: This does not require resuming partial provider generation state. It only requires resuming at the OpenClaw orchestration step by retrying the pending model call.

Proposed run states

Required implementation

Catch provider 429 errors centrally
Parse retry delay if available
Persist retry metadata:
- run/session id
- pending step
- provider
- model
- retry count
- retry_at
Schedule wake/retry using existing scheduling/wake primitives
Retry automatically at with small jitter
Continue run normally if retry succeeds

Retry policy

default max retries:
use provider retry hint first
otherwise use fallback backoff
repeated 429s should use exponential backoff with cap
apply jitter to avoid synchronized retries

UX

retry delay < 30s:
- retry silently by default
retry delay >= 30s:
- surface a short status update like:
  - “Rate-limited, retrying automatically at 09:20:03”

Failure behavior

After max retries:

mark run
surface concise error including:
- provider/model
- limit type if known
- retry attempts
- suggested mitigations:
  - reduce context
  - reduce concurrency
  - use smaller model
  - request higher limits

Optional follow-ons

pre-retry context shrinking
model fallback policy
per-provider concurrency throttling
projected TPM budgeting before dispatch
queueing large runs instead of firing immediately into known TPM exhaustion

Acceptance criteria

A provider 429 with a retry delay causes the run to pause and auto-resume without user prompting
Session/status shows paused-for-rate-limit state and retry time
Successful retry continues the original task
Repeated 429s back off and eventually fail cleanly after max retries
No duplicate final replies are emitted across retries

Why this matters

extent analysis

TL;DR

Implement a retry mechanism that catches provider 429 errors, parses retry delays, and schedules automatic retries to resume paused runs without user prompting.

Guidance

Catch provider 429 errors centrally and parse retry delays from error bodies or structured message text to determine the retry timing.
Implement a retry policy with a default max retries, using provider retry hints, fallback backoff, and exponential backoff with a cap to avoid overwhelming the provider.
Surface status updates to the user when retry delays are 30 seconds or longer, and provide concise error messages with suggested mitigations after max retries.
Consider implementing additional features like pre-retry context shrinking, model fallback policy, and per-provider concurrency throttling to further improve resilience.

Example

# Pseudo-code example of retry mechanism
def handle_provider_error(error):
    if error.status_code == 429:
        retry_delay = parse_retry_delay(error.body)
        schedule_retry(retry_delay)
        # Mark run as paused for rate limit and persist retry metadata

Notes

The implementation should focus on catching 429 errors, parsing retry delays, and scheduling automatic retries. The retry policy should be designed to avoid overwhelming the provider and provide a good user experience. Additional features like pre-retry context shrinking and model fallback policy can be considered to further improve resilience.

Recommendation

Apply a workaround by implementing a retry mechanism that catches provider 429 errors and schedules automatic retries, as this will significantly improve the user experience during long-running work and high-context GPT-5.x turns.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#conversation history #tool integration #LLM response #prompt template #agent execution

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix OpenClaw should auto-pause and auto-retry on provider 429/rate-limit errors [2 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Failure behavior

Optional follow-ons

Problem

Desired behavior

parse retry delay from:

Proposed run states

Required implementation

Retry policy

UX

Failure behavior

Optional follow-ons

Acceptance criteria

Why this matters

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix OpenClaw should auto-pause and auto-retry on provider 429/rate-limit errors [2 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Failure behavior

Optional follow-ons

Problem

Desired behavior

parse retry delay from:

Proposed run states

Required implementation

Retry policy

UX

Failure behavior

Optional follow-ons

Acceptance criteria

Why this matters

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING