claude-code - 💡(How to fix) Fix [BUG] /deep-research — default workflow frequently hits "API Error: Server is temporarily limiting requests"

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

API Error: Server is temporarily limiting requests (not your usage limit) · Rate limited

Root Cause

  1. SECOND BUG — silent data destruction. A verifier that fails / abstains is scored as a REFUTATION. So a run whose search + fetch succeeded (24 sources, 64 claims extracted) returns: "All 25 claims refuted by adversarial verification. Research inconclusive." The tool reports the opposite of the truth — it found plenty, then threw it all away. This is arguably worse than the rate limit, because it looks like "no evidence found."

Code Example

API Error: Server is temporarily limiting requests (not your usage limit) · Rate limited

---

API Error: Server is temporarily limiting requests (not your usage limit) · Rate limited

---

agent({schema}): subagent completed without calling StructuredOutput (after 2 in-conversation nudges)

---

Scope 1/1 ✔ · Search 5/5 ✔ · Fetch 24/24 ✔ · Verify 75/75 FAILED · Synthesize never ran

---

agent({schema}): subagent completed without calling StructuredOutput (after 2 in-conversation nudges)
RAW_BUFFERClick to expand / collapse

Preflight Checklist:

  • I have searched existing issues and this hasn't been reported yet
  • This is a single bug report
  • I am using the latest version of Claude Code

What's Wrong?

The /deep-research default workflow frequently fails with a server-side rate limit error. Only about 3 out of 10 runs complete successfully — the other ~7 terminate partway through. The default workflow spawns multiple concurrent subagents, and the burst of parallel requests appears to trip server-side rate limiting (not a personal usage-limit issue — the error explicitly states "not your usage limit").

This is actually two bugs: the rate limit itself, and a destructive-scoring bug it triggers that silently discards good research and reports the opposite of the truth.

Error returned:

API Error: Server is temporarily limiting requests (not your usage limit) · Rate limited

What Should Happen?

/deep-research should complete its subagent fan-out without tripping rate limits — by throttling/queuing concurrent subagent requests internally, backing off and retrying transparently on a 429, or surfacing a clear retry state rather than failing the whole run. Critically, a throttled verification step should degrade gracefully (keep unverified findings, flagged) rather than discarding them. Given the current ~3/10 success rate, consider gating or deprecating the default workflow until it is reliable.

Error Messages/Logs:

API Error: Server is temporarily limiting requests (not your usage limit) · Rate limited

Per-agent failure signature (every one of the 75 verify agents, two consecutive runs):

agent({schema}): subagent completed without calling StructuredOutput (after 2 in-conversation nudges)

Steps to Reproduce:

  1. Open Claude Code in a repository.
  2. Run /deep-research with the default workflow on a query broad enough to spawn several concurrent subagents.
  3. Observe the run partway through.
  4. Error appears: API Error: Server is temporarily limiting requests (not your usage limit) · Rate limited, and the run terminates without completing (or completes but reports "research inconclusive" — see below).

Note: roughly 7 of 10 runs fail this way; only ~3 of 10 complete. Failure frequency appears tied to concurrent subagent fan-out.


Additional diagnostic detail (from the workflow run output):

This is TWO bugs — the rate limit, and a destructive-scoring bug it triggers.

  1. Failure is isolated to the VERIFY phase, and to schema-bound subagents specifically. Phase breakdown of two consecutive failed runs (identical):

    Scope 1/1 ✔ · Search 5/5 ✔ · Fetch 24/24 ✔ · Verify 75/75 FAILED · Synthesize never ran

    Search and fetch are also concurrent yet survive. The verify phase fans out 3 votes × 25 claims = 75 concurrent agents, each FORCED to emit structured output. That is the widest schema-bound burst, and it is exactly where it dies.

  2. Exact per-agent failure signature (every one of the 75):

    agent({schema}): subagent completed without calling StructuredOutput (after 2 in-conversation nudges)

    i.e. the agent is rate-limited mid-turn, never reaches its forced StructuredOutput call, and is abandoned after 2 nudges.

  3. SECOND BUG — silent data destruction. A verifier that fails / abstains is scored as a REFUTATION. So a run whose search + fetch succeeded (24 sources, 64 claims extracted) returns: "All 25 claims refuted by adversarial verification. Research inconclusive." The tool reports the opposite of the truth — it found plenty, then threw it all away. This is arguably worse than the rate limit, because it looks like "no evidence found."

Run stats per failure: 105 agents, ~1.05M subagent tokens, ~2m50s.

Suggested fixes (in priority order):

  1. Make abstain / no-output ≠ refuted. A failed verifier should mark a claim unverified (kept + flagged), so a throttled verify phase degrades gracefully instead of nuking findings.
  2. Backoff/retry on 429 inside the verify fan-out; cap verify concurrency.
  3. Resume / checkpoint a stalled run — a run that reached "Synthesize pending" loses 100% of ~3 minutes and ~1M tokens of completed work.
  4. Reduce the burst: 1–2 votes, or batched verification (1 agent per N claims).

Environment:

  • Claude Model: claude-opus-4-8
  • Is this a regression? No / Unknown
  • Last Working Version: N/A
  • Claude Code Version: 2.1.165
  • Platform: Anthropic API
  • Operating System: Linux
  • Terminal/Shell: Cursor
  • Plan: Claude Max subscription
  • Feedback ID: 382dad86-2a65-4e5e-b62f-0e6b81e137b0

Additional Information:

Errors array captured: [] (no stack trace surfaced client-side; the failure is a server-returned rate-limit message rather than a local exception).

Possibly related to concurrent subagent fan-out. See related: #63938 (Configurable concurrent subagent limit), #16157 and #38335 (usage-limit reports). A configurable concurrency cap on subagents would likely mitigate the rate limit, but fix #1 above (abstain ≠ refuted) is needed regardless, since it is what turns a transient throttle into total, silent loss of research output.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING