claude-code - 💡(How to fix) Fix [Feature] OnApiRetry hook event for observability and control over internal retry behavior [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#46959Fetched 2026-04-12 13:28:38
View on GitHub
Comments
0
Participants
1
Timeline
3
Reactions
0
Author
Participants
Timeline (top)
labeled ×3

Claude Code retries API calls internally on 429, 529, and transient network errors. These retries are invisible to the user and to hooks. We propose a new OnApiRetry hook event that fires on each retry attempt, giving users and tooling a way to observe, log, and optionally intercept the retry loop.

Error Message

  1. Fire the event on 429, 529, ECONNRESET, EPIPE, ETIMEDOUT, and other retryable error classes.

Root Cause

  1. Session unrecoverability from silent retry accumulation #40316 documents sessions becoming permanently unrecoverable because retries silently accumulate. A hook to observe each retry would let users detect the runaway and bail before the session is lost.

Code Example

{
  "attempt": 2,
  "max_attempts": 5,
  "error_code": "429",
  "error_message": "rate_limit_exceeded",
  "retry_after_ms": 13000,
  "request_type": "messages",
  "session_id": "..."
}

---

{
  "action": "continue" | "abort" | "wait_longer",
  "systemMessage": "optional message to inject into conversation",
  "wait_ms": 30000
}
RAW_BUFFERClick to expand / collapse

Summary

Claude Code retries API calls internally on 429, 529, and transient network errors. These retries are invisible to the user and to hooks. We propose a new OnApiRetry hook event that fires on each retry attempt, giving users and tooling a way to observe, log, and optionally intercept the retry loop.

Problem

Internal retries are a significant part of what the CLI does, but the current observability gap hurts users in several documented ways:

  1. Session unrecoverability from silent retry accumulation #40316 documents sessions becoming permanently unrecoverable because retries silently accumulate. A hook to observe each retry would let users detect the runaway and bail before the session is lost.

  2. Retries eating rate limit budget invisibly #44850 "Telemetry events competing with API requests for rate limit budget" shows users already suspect their budget is being consumed by internal traffic they cannot see. Retry visibility would settle this.

  3. Connection errors dropped with zero recovery #37077 reports ECONNRESET, EPIPE, ETIMEDOUT errors fail immediately with no retry or visibility. A hook would let users implement their own policy (custom backoff, notification, switch to a different provider).

  4. 529 retry behavior inconsistent across modes #35801 requests auto-retry on 529 in interactive mode. An OnApiRetry hook would let users implement this themselves while Anthropic decides on the default.

Proposed design

New hook event OnApiRetry fires just before each retry attempt. Input schema:

{
  "attempt": 2,
  "max_attempts": 5,
  "error_code": "429",
  "error_message": "rate_limit_exceeded",
  "retry_after_ms": 13000,
  "request_type": "messages",
  "session_id": "..."
}

Output contract mirrors the PermissionDenied pattern established in #37769:

{
  "action": "continue" | "abort" | "wait_longer",
  "systemMessage": "optional message to inject into conversation",
  "wait_ms": 30000
}

Users register it in settings.json like any other hook event.

Why existing issues do not cover this

Verified across 8 keyword searches (OnRetry hook, retry event, retry visibility, retry hook, retry observability, api retry callback, silent retry 429, 429 retry hook). Zero hits on a hook event fired on API-level retries.

Adjacent but distinct:

  • #45309 Hook Retry Limit. About retrying hooks themselves, not API calls. Opposite direction.
  • #37769 PermissionDenied hook event. Different lifecycle event (permission denial). Proves the appetite for new hook events and establishes the systemMessage output contract this proposal reuses.
  • #21531 BeforeModel and AfterModel hooks. Broader model hooks proposal. Our ask is narrower, scoped specifically to the retry path inside the request cycle.
  • #40872 Auto-retry last message after usage quota window resets. About retry behavior, not observability.

Use cases this unlocks

  • Statusline retry indicator. Show [retry 2/5 after 429] in the statusline.
  • Retry log file. Log every retry to ~/.claude/logs/retries.jsonl for cost attribution and post-mortem.
  • Opt out of silent retries. Return {action: abort} to fail fast on the first 429.
  • Custom backoff. Override wait_ms to implement exponential backoff tuned to the user's workload.
  • Provider failover. On N retries, write a signal file that a wrapper script watches for and switches providers.
  • Telemetry budget protection. Pause non-essential retries during high-usage windows, addressing #44850.

Request

  1. Implement OnApiRetry as a new hook event.
  2. Define input and output schemas in the hooks documentation.
  3. Fire the event on 429, 529, ECONNRESET, EPIPE, ETIMEDOUT, and other retryable error classes.
  4. Allow the hook to abort, continue, or modify the retry delay.

Suggested labels: feature request, area:hooks, area:api

extent analysis

TL;DR

Implement the proposed OnApiRetry hook event to provide visibility and control over API retries, addressing issues with session unrecoverability, rate limit budget consumption, and connection errors.

Guidance

  • Review the proposed input and output schemas for the OnApiRetry hook event to understand the available data and possible actions.
  • Consider implementing custom retry policies, such as exponential backoff or provider failover, using the OnApiRetry hook.
  • Evaluate the use cases unlocked by the OnApiRetry hook, such as statusline retry indicators, retry log files, and telemetry budget protection.
  • Assess the potential impact of the OnApiRetry hook on existing issues, including session unrecoverability and rate limit budget consumption.

Example

// Example input schema for OnApiRetry hook event
{
  "attempt": 2,
  "max_attempts": 5,
  "error_code": "429",
  "error_message": "rate_limit_exceeded",
  "retry_after_ms": 13000,
  "request_type": "messages",
  "session_id": "..."
}

// Example output schema for OnApiRetry hook event
{
  "action": "continue",
  "systemMessage": "optional message to inject into conversation",
  "wait_ms": 30000
}

Notes

The implementation of the OnApiRetry hook event requires careful consideration of the input and output schemas, as well as the potential impact on existing issues and use cases.

Recommendation

Apply the proposed OnApiRetry hook event to provide visibility and control over API retries, addressing issues with session unrecoverability, rate limit budget consumption, and connection errors. This will unlock various use cases, such as custom retry policies and telemetry budget protection.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING