claude-code - 💡(How to fix) Fix [BUG] Workflow harness retries indefinitely on HTTP 429 rate_limit — 97 agents, 2M tokens burned in 34 seconds ()

Error Message

USER-FACING (what surfaced):

<status>failed</status>

<summary>Error: agent({schema}): subagent completed without calling StructuredOutput (after 2 in-conversation nudges)</summary> <failures> parallel[0] failed: agent({schema}): subagent completed without calling StructuredOutput parallel[1] failed: ... [same line repeated ~15 times across retries] </failures> <usage> <agent_count>97</agent_count> <subagent_tokens>2077133</subagent_tokens> <tool_uses>282</tool_uses> <duration_ms>628578</duration_ms> </usage>

ACTUAL ROOT CAUSE (extracted from agent JSONL, not surfaced to user):

{ "timestamp": "2026-05-31T17:31:37.542Z", "apiErrorStatus": 429, "requestId": "req_011Cbb3afFu5gMMi4ZPGSnZt", "error": "rate_limit", "isApiErrorMessage": true, "text": "You've hit your session limit · resets 10:10pm (Europe/Kiev)", "stop_reason": "stop_sequence", "usage": { "input_tokens": 0, "output_tokens": 0 } }

RETRY LOOP METRICS:

37 retries in 34.493 seconds (avg interval 0.96s, no backoff)
First 429: 2026-05-31T17:31:25.988Z
Last 429: 2026-05-31T17:32:00.481Z
All 37 retries got identical 429 with identical error text
Workflow continued retrying long past the point where the response was clearly not transient

Code Example

USER-FACING (what surfaced):

<status>failed</status>
<summary>Error: agent({schema}): subagent completed without calling StructuredOutput (after 2 in-conversation nudges)</summary>
<failures>
  parallel[0] failed: agent({schema}): subagent completed without calling StructuredOutput
  parallel[1] failed: ...
  [same line repeated ~15 times across retries]
</failures>
<usage>
  <agent_count>97</agent_count>
  <subagent_tokens>2077133</subagent_tokens>
  <tool_uses>282</tool_uses>
  <duration_ms>628578</duration_ms>
</usage>


ACTUAL ROOT CAUSE (extracted from agent JSONL, not surfaced to user):

{
  "timestamp": "2026-05-31T17:31:37.542Z",
  "apiErrorStatus": 429,
  "requestId": "req_011Cbb3afFu5gMMi4ZPGSnZt",
  "error": "rate_limit",
  "isApiErrorMessage": true,
  "text": "You've hit your session limit · resets 10:10pm (Europe/Kiev)",
  "stop_reason": "stop_sequence",
  "usage": { "input_tokens": 0, "output_tokens": 0 }
}


RETRY LOOP METRICS:
- 37 retries in 34.493 seconds (avg interval 0.96s, no backoff)
- First 429: 2026-05-31T17:31:25.988Z
- Last 429:  2026-05-31T17:32:00.481Z
- All 37 retries got identical 429 with identical error text
- Workflow continued retrying long past the point where the response was clearly not transient

Preflight Checklist

I have searched existing issues and this hasn't been reported yet
This is a single bug report (please file separate reports for different bugs)
I am using the latest version of Claude Code

What's Wrong?

I'm providing the information that Claude Code was able to collect about this issue. It's possible that its explanation of the cause is incorrect, but I hope the data is sufficient for your team to investigate the underlying problem.

The main issue is that a small and relatively simple request consumed about 80% of my 5-hour usage limit on the Max (5x) plan yesterday. The same thing happened again today and consumed roughly 50% of the limit.

Thank you in advance for looking into this.

The deep-research workflow primitive treats HTTP 429 rate_limit responses from the Anthropic API as "subagent completed without calling StructuredOutput" and enters a tight retry loop with no exponential backoff and no circuit breaker.

In my run, a subagent hit my session limit mid-workflow (HTTP 429). The harness then fired 37 retries in 34.5 seconds (~1 retry/sec). Each retry was itself a billable API call counting toward the same exhausted limit, so the loop actively prevented recovery. Final stats: 97 agent invocations, 2,077,133 subagent tokens, 10.5 min wall-clock — for a research query that should have cost <50k tokens.

The user-facing error hides the root cause: "subagent completed without calling StructuredOutput (after 2 in-conversation nudges)". The real cause was only visible by grepping agent JSONL transcripts: "error":"rate_limit", "apiErrorStatus":429, "text":"You've hit your session limit".

Three distinct defects stack here:

agent({schema}) primitive doesn't distinguish HTTP errors from genuine missing-StructuredOutput
No exponential backoff and no circuit breaker on retries
User-facing error message masks the underlying API error

But yesterday there was an even worse issue where nearly 4 million tokens were consumed by a relatively small request. Unfortunately, I no longer have the logs or data from that incident.

What Should Happen?

When a subagent returns HTTP 429 / 5xx / quota error from the Anthropic API, the workflow harness should:

Detect error === "rate_limit" or apiErrorStatus >= 429 and abort the entire workflow immediately — do NOT classify as "missing StructuredOutput call".
Surface the real API error to the user: e.g. "Workflow aborted — API session limit reached, resets at [time]."
For genuine transient errors (different from quota), use exponential backoff (1s → 2s → 4s, jittered) and a max-3-consecutive-identical-failures circuit breaker.

Error Messages/Logs

USER-FACING (what surfaced):

<status>failed</status>
<summary>Error: agent({schema}): subagent completed without calling StructuredOutput (after 2 in-conversation nudges)</summary>
<failures>
  parallel[0] failed: agent({schema}): subagent completed without calling StructuredOutput
  parallel[1] failed: ...
  [same line repeated ~15 times across retries]
</failures>
<usage>
  <agent_count>97</agent_count>
  <subagent_tokens>2077133</subagent_tokens>
  <tool_uses>282</tool_uses>
  <duration_ms>628578</duration_ms>
</usage>


ACTUAL ROOT CAUSE (extracted from agent JSONL, not surfaced to user):

{
  "timestamp": "2026-05-31T17:31:37.542Z",
  "apiErrorStatus": 429,
  "requestId": "req_011Cbb3afFu5gMMi4ZPGSnZt",
  "error": "rate_limit",
  "isApiErrorMessage": true,
  "text": "You've hit your session limit · resets 10:10pm (Europe/Kiev)",
  "stop_reason": "stop_sequence",
  "usage": { "input_tokens": 0, "output_tokens": 0 }
}


RETRY LOOP METRICS:
- 37 retries in 34.493 seconds (avg interval 0.96s, no backoff)
- First 429: 2026-05-31T17:31:25.988Z
- Last 429:  2026-05-31T17:32:00.481Z
- All 37 retries got identical 429 with identical error text
- Workflow continued retrying long past the point where the response was clearly not transient

Steps to Reproduce

Run any user session up to ~40-50% of the Pro 5-hour token quota.
Invoke the built-in deep-research skill with a query complex enough to spawn 60+ subagents (multi-aspect research with cross-source verification).
Workflow proceeds through Scope → Search → Fetch phases normally (legitimate token use).
Mid-Verify phase the session limit is crossed. Next subagent returns HTTP 429.
Observe: workflow does NOT abort. It retries the failing subagent every ~1 second for 34 seconds, spawning ~37 retries — each one returning the same 429 and each one counting toward the now-exhausted limit.
Eventually orchestrator gives up. Final: 97 agent_count, 2M subagent tokens consumed for zero usable output.

Expected: at step 5, on the FIRST 429, workflow should abort and surface the API error.

Claude Model

Opus

Is this a regression?

I don't know

Last Working Version

No response

Claude Code Version

2.1.156

Platform

Anthropic API

Operating System

macOS

Terminal/Shell

VS Code integrated terminal

Additional Information

Interface: Claude Code VSCode extension (transcripts show entrypoint: claude-vscode). The Terminal/Shell field doesn't have a VSCode option — consider adding one.

Platform detail: macOS Darwin 24.5.0.

Key identifiers for your log lookup:

Workflow Run ID: wf_6a68935f-35f
Task ID: whdk5dfwz
Session ID: 83411bf2-c4b1-4d00-9e6e-ba0260b9df3d
First failed Anthropic requestId: req_011Cbb3afFu5gMMi4ZPGSnZt
First 429: 2026-05-31T17:31:25.988Z
Last 429: 2026-05-31T17:32:00.481Z

Full transcripts (98 JSONL files, 37 contain 429 errors) are on disk at: ~/.claude/projects/<session>/subagents/workflows/wf_6a68935f-35f/

Happy to share privately if useful for repro.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering