claude-code - 💡(How to fix) Fix [FEATURE] No hook event for upstream tool/infra failures (5xx, network errors) [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#48650Fetched 2026-04-16 06:54:42
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Participants
Timeline (top)
labeled ×1

Error Message

Today these failures are only visible as error strings inside a tool result that Claude happens to read and relay. That makes them: A hook event (e.g. ToolCallError or HarnessError) that fires on non-2xx responses from the tool infrastructure, network failures, and timeouts — with the tool name, error class, status code, and retry count in the payload. Same shape as existing hooks so it composes with current configs. "error": { "message": "Internal Server Error", Append to a local JSONL for session-level error telemetry This is mostly a joke, but I would likely not hesitate to set said hook up to either change the behavior around an error to like I mentioned above, play Killing in the Name Of at the 4m12s timestamp, such that Claude is telling me very clearly "No It won't do what I tell it" so that it gives me a good chuckle.

Root Cause

Invisible to automation. Users can't wire telemetry, paging, auto-retry-with-backoff, or desktop notifications to infra failures, because there's no event to hook. Inconsistently surfaced. Whether a user learns their session hit a 500 depends on whether Claude decides to mention it in the response, which is non-deterministic. Non-actionable for teams. Power users running Claude Code in CI, long-running agents, or /loop-style workflows have no way to distinguish "Claude failed the task" from "the harness failed Claude" without scraping transcripts. What would solve it:

Code Example

{
  "hook_event_name": "ToolCallError",
  "tool_name": "Bash",
  "tool_input": { ... },
  "error": {
    "kind": "http_5xx" | "network" | "timeout" | "rate_limit" | "other",
    "status_code": 500,
    "message": "Internal Server Error",
    "retry_attempt": 2,
    "will_retry": true
  },
  "session_id": "...",
  "transcript_path": "..."
}
RAW_BUFFERClick to expand / collapse

Preflight Checklist

  • I have searched existing requests and this feature hasn't been requested yet
  • This is a single feature request (not multiple features)

Problem Statement

Claude Code's hook system exposes events tied to Claude's actions — PreToolUse, PostToolUse, Stop, UserPromptSubmit, etc. — but nothing fires when the harness itself fails: upstream 5xx responses, network errors, tool timeouts, or API rate limits surfaced mid-session.

Today these failures are only visible as error strings inside a tool result that Claude happens to read and relay. That makes them:

Invisible to automation. Users can't wire telemetry, paging, auto-retry-with-backoff, or desktop notifications to infra failures, because there's no event to hook. Inconsistently surfaced. Whether a user learns their session hit a 500 depends on whether Claude decides to mention it in the response, which is non-deterministic. Non-actionable for teams. Power users running Claude Code in CI, long-running agents, or /loop-style workflows have no way to distinguish "Claude failed the task" from "the harness failed Claude" without scraping transcripts. What would solve it:

A hook event (e.g. ToolCallError or HarnessError) that fires on non-2xx responses from the tool infrastructure, network failures, and timeouts — with the tool name, error class, status code, and retry count in the payload. Same shape as existing hooks so it composes with current configs.

Why now:

As Claude Code gets used for longer, more autonomous workflows (scheduled agents, /loop, CI runs), silent infra failures become a correctness and observability problem, not just a UX papercut.

Proposed Solution

A new hook event, potentially ToolCallError. Fires whenever the harness fails to get a successful response from a tool. Can be prior to or instead of surfacing the failure to Claude.

Hypothetical payload:

{
  "hook_event_name": "ToolCallError",
  "tool_name": "Bash",
  "tool_input": { ... },
  "error": {
    "kind": "http_5xx" | "network" | "timeout" | "rate_limit" | "other",
    "status_code": 500,
    "message": "Internal Server Error",
    "retry_attempt": 2,
    "will_retry": true
  },
  "session_id": "...",
  "transcript_path": "..."
}

Matcher semantics: I'd believe similar as PreToolUse / PostToolUse — users can match by tool_name pattern, so you can scope hooks to Bash, WebFetch, MCP tools, etc. independently.

Fires: Once per failed attempt, including during retries (so observability tools see the full retry story), with a final event when the harness gives up vs. successfully retries.

Example configs this unlocks:

osascript -e 'display notification "Claude hit a 500"' — desktop ping on infra failure Append to a local JSONL for session-level error telemetry Post to Slack/Discord on repeated failures in a long-running /loop or scheduled agent Play a song such as Killing in the Name Of at the 4M12S timestamp. (The original motivating use case. Completely Unserious, but a valid signal that the event surface is missing.)

Alternative Solutions

I had looked at PostToolUse, but it's unclear if it would be consistent on failed tool calls. Additionally, you'd likely have to string match which is brittle if things would change.

Priority

Medium - Would be very helpful

Feature Category

Configuration and settings

Use Case Example

This is mostly a joke, but I would likely not hesitate to set said hook up to either change the behavior around an error to like I mentioned above, play Killing in the Name Of at the 4m12s timestamp, such that Claude is telling me very clearly "No It won't do what I tell it" so that it gives me a good chuckle.

A more serious example, leveraging Claude for Triaging Incidents to pull new incidents, correlate them with deploys, check dashboards and structure a summary to an incident channel in Slack before on call is notified could result in broken or missing triage context, no message at all due to silent failures etc.

The hook for ToolCallError could reasonably unlock observability when Claude is running more autonomously, instead of producing silent failures that could lead to missing an SLA.

Additional Context

No response

extent analysis

TL;DR

Implementing a new hook event, ToolCallError, would provide a solution to the problem of silent infra failures in Claude Code by firing an event on non-2xx responses from the tool infrastructure, network failures, and timeouts.

Guidance

  • Introduce a new hook event, ToolCallError, with a payload containing the tool name, error class, status code, and retry count to provide visibility into infra failures.
  • Define the payload shape and matcher semantics for the new hook event, similar to existing hooks like PreToolUse and PostToolUse.
  • Ensure the new hook event fires once per failed attempt, including during retries, and with a final event when the harness gives up or successfully retries.
  • Consider implementing example configs to demonstrate the usage of the new hook event, such as desktop notifications or error telemetry.

Example

{
  "hook_event_name": "ToolCallError",
  "tool_name": "Bash",
  "tool_input": { ... },
  "error": {
    "kind": "http_5xx",
    "status_code": 500,
    "message": "Internal Server Error",
    "retry_attempt": 2,
    "will_retry": true
  },
  "session_id": "...",
  "transcript_path": "..."
}

Notes

The implementation of the ToolCallError hook event should be carefully considered to ensure consistency with existing hooks and to avoid introducing new issues.

Recommendation

Apply the workaround of implementing the ToolCallError hook event, as it provides a clear solution to the problem of silent infra failures in Claude Code. This will enable users to wire telemetry, paging, auto-retry-with-backoff, or desktop notifications to infra failures, making the system more observable and reliable.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix [FEATURE] No hook event for upstream tool/infra failures (5xx, network errors) [1 participants]