claude-code - 💡(How to fix) Fix CLI surfaces transient 429 "rate_limit_error" to user with no backoff/retry; session sits idle until user types `continue`

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

When the Anthropic API returns a transient 429 rate_limit_error ("Server is temporarily limiting requests (not your usage limit) · Rate limited"), Claude Code v2.1.x writes a synthetic assistant message tagged isApiErrorMessage:true to the JSONL and stops. There is no automatic backoff or retry — the session sits idle until the user manually types continue. The same transient typically clears within a few seconds; a small jittered exponential backoff on the CLI side would resolve nearly all of these without user intervention.

Error Message

{ "type": "assistant", "isApiErrorMessage": true, "apiErrorStatus": 429, "error": "rate_limit", "message": { "model": "<synthetic>", "stop_reason": "stop_sequence", "content": [{ "type": "text", "text": "API Error: Server is temporarily limiting requests (not your usage limit) · Rate limited" }] }, "requestId": "req_011Cb...", "userType": "external", "entrypoint": "cli", "version": "2.1.145" }

Root Cause

When the Anthropic API returns a transient 429 rate_limit_error ("Server is temporarily limiting requests (not your usage limit) · Rate limited"), Claude Code v2.1.x writes a synthetic assistant message tagged isApiErrorMessage:true to the JSONL and stops. There is no automatic backoff or retry — the session sits idle until the user manually types continue. The same transient typically clears within a few seconds; a small jittered exponential backoff on the CLI side would resolve nearly all of these without user intervention.

Fix Action

Fix / Workaround

Local mitigation

Code Example

{
  "type": "assistant",
  "isApiErrorMessage": true,
  "apiErrorStatus": 429,
  "error": "rate_limit",
  "message": {
    "model": "<synthetic>",
    "stop_reason": "stop_sequence",
    "content": [{
      "type": "text",
      "text": "API Error: Server is temporarily limiting requests (not your usage limit) · Rate limited"
    }]
  },
  "requestId": "req_011Cb...",
  "userType": "external",
  "entrypoint": "cli",
  "version": "2.1.145"
}

---

sleep = uniform(0, min(cap, base * 2**attempt))
RAW_BUFFERClick to expand / collapse

Summary

When the Anthropic API returns a transient 429 rate_limit_error ("Server is temporarily limiting requests (not your usage limit) · Rate limited"), Claude Code v2.1.x writes a synthetic assistant message tagged isApiErrorMessage:true to the JSONL and stops. There is no automatic backoff or retry — the session sits idle until the user manually types continue. The same transient typically clears within a few seconds; a small jittered exponential backoff on the CLI side would resolve nearly all of these without user intervention.

Environment

  • Claude Code CLI v2.1.145 (also confirmed on v2.1.149)
  • Linux (Debian 12 cloud VM)
  • Direct Anthropic API (Max plan OAuth, not Bedrock/Vertex/Foundry)

What we observed

Running ~25 concurrent Claude Code sessions against one API key for a fan-out workflow, two cluster-wide 429 spikes today:

  • 17:35 UTC — 26 sessions hit a 429 inside the same minute
  • 15:17 UTC — 25 sessions hit a 429 inside the same minute

Total of 55 × 429s across 50 sessions over 3 hours. Every one of them was the synthetic-message-then-stop shape, and every one would have cleared with a few seconds of backoff (we recovered each by typing continue 5–10 s later and the next request succeeded).

Reproduction

Easiest repro: run enough concurrent Claude Code sessions on one API key to saturate the per-key shared rate limit. The "Server is temporarily limiting requests (not your usage limit)" wording is the server-side throttle, distinct from the account-side Usage credits required 429s reported in other issues (#22876, #60438, #62314).

Evidence — the synthetic error message

From an affected session's JSONL transcript (~/.claude/projects/<workspace-slug>/<session-id>.jsonl), the assistant entry following a successful tool_result:

{
  "type": "assistant",
  "isApiErrorMessage": true,
  "apiErrorStatus": 429,
  "error": "rate_limit",
  "message": {
    "model": "<synthetic>",
    "stop_reason": "stop_sequence",
    "content": [{
      "type": "text",
      "text": "API Error: Server is temporarily limiting requests (not your usage limit) · Rate limited"
    }]
  },
  "requestId": "req_011Cb...",
  "userType": "external",
  "entrypoint": "cli",
  "version": "2.1.145"
}

In our case the session then sat idle for 6 min 18 s until the user typed continue, after which inference resumed and the next request succeeded immediately. That entire 6+ minute window was avoidable.

What we'd hope for

Full-jitter exponential backoff (per the AWS 2015 "Exponential Backoff And Jitter" formula) on the CLI's HTTP client when the response is error.type === \"rate_limit_error\" or \"overloaded_error\":

sleep = uniform(0, min(cap, base * 2**attempt))

Suggested defaults: base=1.0s, cap=30.0s, max_retries=6 (≈64 s upper bound on total backoff budget). Honor retry-after if Anthropic ships it as a response header. After exhausting retries, fall through to the current synthetic-message-then-stop behavior.

This is the same shape the official Anthropic Python SDK uses internally via its max_retries constructor parameter, and matches what we've had to ship as a Stop-hook wrapper on our side to bridge the gap. We'd much rather it lived in the CLI.

Local mitigation

For anyone hitting this in the meantime: a Claude Code Stop hook can read the transcript tail, detect isApiErrorMessage:true, sleep with exp backoff, and exit 2 to make the harness retry. The retry attempt count is recoverable from the JSONL itself (count of consecutive trailing isApiErrorMessage entries), so multi-retry behavior survives stop_hook_active=true. We shipped one as retry-on-api-error.py — happy to share the source if it'd help.

Why this isn't the same as the other 429 issues

  • #22876 / #40128 / #61808 / #62314: account-side 429s on accounts with available quota — those are a billing/quota classification bug, distinct shape.
  • #60438: a persistent 429 from auto-mode classification — not transient.
  • #60967 (closed): UX request to suggest starting a new conversation. Related but doesn't fix the wait.
  • #47931: Windows-specific silent termination after parallel tool_results, no error written.

This issue is specifically: the CLI sees a clearly-retryable transient 429 rate_limit_error response, has all the information it needs to back off and retry, and instead just stops.

Test plan

  • Confirm the CLI does in fact have no retry path for rate_limit_error (search HTTP client retry logic).
  • Add jittered exp backoff with the defaults above (or whatever Anthropic prefers).
  • Honor retry-after if set.
  • Fall through to the current synthetic-message-then-stop only after the budget is exhausted.
  • Test under a forced 429 (e.g. against a local mock returning 429 N times before 200) that the session recovers without user intervention.

🤖 Filed by Claude Code on behalf of a multi-session operator.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix CLI surfaces transient 429 "rate_limit_error" to user with no backoff/retry; session sits idle until user types `continue`