claude-code - 💡(How to fix) Fix Bedrock makes 3x more tool calls and API round-trips than direct API for identical tasks

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Claude Code on Bedrock makes significantly more API calls and tool invocations than Claude Code on the direct Anthropic API for the same task, resulting in 3x slower completion times on complex tasks. The root cause appears to be fewer tools loaded on Bedrock (26 vs 45) and different model behavior causing more incremental, cautious execution.

Root Cause

Claude Code on Bedrock makes significantly more API calls and tool invocations than Claude Code on the direct Anthropic API for the same task, resulting in 3x slower completion times on complex tasks. The root cause appears to be fewer tools loaded on Bedrock (26 vs 45) and different model behavior causing more incremental, cautious execution.

RAW_BUFFERClick to expand / collapse

Summary

Claude Code on Bedrock makes significantly more API calls and tool invocations than Claude Code on the direct Anthropic API for the same task, resulting in 3x slower completion times on complex tasks. The root cause appears to be fewer tools loaded on Bedrock (26 vs 45) and different model behavior causing more incremental, cautious execution.

Environment

  • Claude Code v2.1.81
  • Model: Sonnet 4.6
  • Bedrock: EU cross-region inference profile (eu.anthropic.claude-sonnet-4-6)
  • Anthropic: Direct subscription, Sonnet 4.6
  • OS: macOS (Apple Silicon)

Reproduction

Task: Given an identical 1,820-line Python codebase with 3 bugs and duplicated code, asked Claude Code to:

  1. Fix 3 bugs in utils.py
  2. Refactor 15 duplicated dataclasses and 8 duplicated processor classes into base classes
  3. Write and run tests

Same prompt, same codebase, same model (Sonnet 4.6), same machine, run back-to-back.

Results

MetricBedrockAnthropic DirectRatio
User-perceived time5m 36s1m 53s3.0x
API calls46182.6x
Tool calls45143.2x
Read calls2363.8x
Edit calls832.7x
Bash calls414x
Grep calls616x
Max conversation messages210842.5x
Final context tokens78,06643,1911.8x
Tools loaded26450.58x
Avg TTFT per call2.43s1.85s~same
Avg generation speed per call5.06s5.26s~same

Key Finding

Token generation speed is identical (5.06s vs 5.26s avg per call). The 3x total time difference comes entirely from:

  1. 3.2x more tool calls — the model on Bedrock takes small incremental steps (23 reads, 6 greps) instead of bold moves (6 reads, 1 grep). Each tool call triggers a new API round-trip with ~2.4s TTFT overhead.

  2. Fewer tools loaded — Bedrock loads 26 deferred tools, Anthropic direct loads 45. The 19 missing tools may be causing the model to fall back to more primitive, multi-step approaches.

  3. Conversation bloat — 45 tool calls generate 210 messages (vs 84), causing context to grow to 78K tokens (vs 43K). Larger context = slower later calls.

Questions

  1. Why does Bedrock load 26 tools vs 45? Which tools are excluded and why? Is this intentional?
  2. Does the tool count affect model behavior? If the model has fewer tools available, does it become more cautious/incremental?
  3. Can the tool set be made identical between Bedrock and direct API?

Debug Logs

Full debug logs available (--debug-file output) for both runs showing per-call timing, tool usage, and context growth. Happy to share if helpful.

Impact

We are deploying Claude Code via Bedrock to 150+ developers at an enterprise. This 3x performance gap on complex tasks significantly impacts developer productivity. Simple tasks (small projects, single-file edits) show no meaningful difference — the gap only appears on large codebases with complex multi-step work, which is the primary use case.

extent analysis

TL;DR

The 3x slower completion time on Bedrock compared to the direct Anthropic API can be mitigated by investigating and potentially adjusting the tool loading behavior to match the 45 tools loaded on the direct API.

Guidance

  • Investigate why Bedrock loads only 26 tools compared to 45 on the direct Anthropic API and determine if this difference is intentional or a configuration issue.
  • Analyze the debug logs to understand how the model behaves with fewer tools available and if it indeed becomes more cautious or incremental, leading to more API and tool calls.
  • Explore the possibility of making the tool set identical between Bedrock and the direct API to see if this resolves the performance gap.
  • Consider the impact of conversation bloat and context growth on later calls and how optimizing tool usage could reduce this effect.

Example

No specific code snippet is provided as the issue seems to be related to configuration or model behavior rather than a code-level problem.

Notes

The performance gap is specifically noted on complex tasks with large codebases, suggesting that the issue may not be as pronounced in simpler use cases. The fact that token generation speed is identical between Bedrock and the direct API points towards the issue being related to the number of tool calls and conversation management rather than the generation speed itself.

Recommendation

Apply a workaround by investigating and potentially adjusting the tool loading configuration on Bedrock to match the direct API, as this seems to be a key factor in the performance difference. This approach is chosen because it directly addresses the identified discrepancy in tool loading between the two environments, which is likely causing the model to behave differently and result in more API and tool calls.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix Bedrock makes 3x more tool calls and API round-trips than direct API for identical tasks