claude-code - 💡(How to fix) Fix CLI surfaces transient 429 "rate_limit_error" to user with no backoff/retry; session sits idle until user types `continue`

claude-code2026-05-30 18:04:09

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

When the Anthropic API returns a transient 429 rate_limit_error ("Server is temporarily limiting requests (not your usage limit) · Rate limited"), Claude Code v2.1.x writes a synthetic assistant message tagged isApiErrorMessage:true to the JSONL and stops. There is no automatic backoff or retry — the session sits idle until the user manually types continue. The same transient typically clears within a few seconds; a small jittered exponential backoff on the CLI side would resolve nearly all of these without user intervention.

Error Message

{ "type": "assistant", "isApiErrorMessage": true, "apiErrorStatus": 429, "error": "rate_limit", "message": { "model": "<synthetic>", "stop_reason": "stop_sequence", "content": [{ "type": "text", "text": "API Error: Server is temporarily limiting requests (not your usage limit) · Rate limited" }] }, "requestId": "req_011Cb...", "userType": "external", "entrypoint": "cli", "version": "2.1.145" }

Root Cause

Fix Action

Fix / Workaround

Local mitigation

Code Example

{
  "type": "assistant",
  "isApiErrorMessage": true,
  "apiErrorStatus": 429,
  "error": "rate_limit",
  "message": {
    "model": "<synthetic>",
    "stop_reason": "stop_sequence",
    "content": [{
      "type": "text",
      "text": "API Error: Server is temporarily limiting requests (not your usage limit) · Rate limited"
    }]
  },
  "requestId": "req_011Cb...",
  "userType": "external",
  "entrypoint": "cli",
  "version": "2.1.145"
}

---

sleep = uniform(0, min(cap, base * 2**attempt))

RAW_BUFFERClick to expand / collapse

Summary

Environment

Claude Code CLI v2.1.145 (also confirmed on v2.1.149)
Linux (Debian 12 cloud VM)
Direct Anthropic API (Max plan OAuth, not Bedrock/Vertex/Foundry)

What we observed

Running ~25 concurrent Claude Code sessions against one API key for a fan-out workflow, two cluster-wide 429 spikes today:

17:35 UTC — 26 sessions hit a 429 inside the same minute
15:17 UTC — 25 sessions hit a 429 inside the same minute

Total of 55 × 429s across 50 sessions over 3 hours. Every one of them was the synthetic-message-then-stop shape, and every one would have cleared with a few seconds of backoff (we recovered each by typing continue 5–10 s later and the next request succeeded).

Reproduction

Easiest repro: run enough concurrent Claude Code sessions on one API key to saturate the per-key shared rate limit. The "Server is temporarily limiting requests (not your usage limit)" wording is the server-side throttle, distinct from the account-side Usage credits required 429s reported in other issues (#22876, #60438, #62314).

Evidence — the synthetic error message

From an affected session's JSONL transcript (~/.claude/projects/<workspace-slug>/<session-id>.jsonl), the assistant entry following a successful tool_result:

{
  "type": "assistant",
  "isApiErrorMessage": true,
  "apiErrorStatus": 429,
  "error": "rate_limit",
  "message": {
    "model": "<synthetic>",
    "stop_reason": "stop_sequence",
    "content": [{
      "type": "text",
      "text": "API Error: Server is temporarily limiting requests (not your usage limit) · Rate limited"
    }]
  },
  "requestId": "req_011Cb...",
  "userType": "external",
  "entrypoint": "cli",
  "version": "2.1.145"
}

In our case the session then sat idle for 6 min 18 s until the user typed continue, after which inference resumed and the next request succeeded immediately. That entire 6+ minute window was avoidable.

What we'd hope for

Full-jitter exponential backoff (per the AWS 2015 "Exponential Backoff And Jitter" formula) on the CLI's HTTP client when the response is error.type === \"rate_limit_error\" or \"overloaded_error\":

sleep = uniform(0, min(cap, base * 2**attempt))

Suggested defaults: base=1.0s, cap=30.0s, max_retries=6 (≈64 s upper bound on total backoff budget). Honor retry-after if Anthropic ships it as a response header. After exhausting retries, fall through to the current synthetic-message-then-stop behavior.

This is the same shape the official Anthropic Python SDK uses internally via its max_retries constructor parameter, and matches what we've had to ship as a Stop-hook wrapper on our side to bridge the gap. We'd much rather it lived in the CLI.

Local mitigation

For anyone hitting this in the meantime: a Claude Code Stop hook can read the transcript tail, detect isApiErrorMessage:true, sleep with exp backoff, and exit 2 to make the harness retry. The retry attempt count is recoverable from the JSONL itself (count of consecutive trailing isApiErrorMessage entries), so multi-retry behavior survives stop_hook_active=true. We shipped one as retry-on-api-error.py — happy to share the source if it'd help.

Why this isn't the same as the other 429 issues

#22876 / #40128 / #61808 / #62314: account-side 429s on accounts with available quota — those are a billing/quota classification bug, distinct shape.
#60438: a persistent 429 from auto-mode classification — not transient.
#60967 (closed): UX request to suggest starting a new conversation. Related but doesn't fix the wait.
#47931: Windows-specific silent termination after parallel tool_results, no error written.

This issue is specifically: the CLI sees a clearly-retryable transient 429 rate_limit_error response, has all the information it needs to back off and retry, and instead just stops.

Test plan

Confirm the CLI does in fact have no retry path for rate_limit_error (search HTTP client retry logic).
Add jittered exp backoff with the defaults above (or whatever Anthropic prefers).
Honor retry-after if set.
Fall through to the current synthetic-message-then-stop only after the budget is exhausted.
Test under a forced 429 (e.g. against a local mock returning 429 N times before 200) that the session recovers without user intervention.

🤖 Filed by Claude Code on behalf of a multi-session operator.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix CLI surfaces transient 429 "rate_limit_error" to user with no backoff/retry; session sits idle until user types `continue`

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Local mitigation

Code Example

Summary

Environment

What we observed

Reproduction

Evidence — the synthetic error message

What we'd hope for

Local mitigation

Why this isn't the same as the other 429 issues

Test plan

Still need to ship something?

TRENDING