openclaw - 💡(How to fix) Fix Retry storm: same inbound message replayed 94+ times causing excessive API usage ($149 in one day) [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#59132Fetched 2026-04-08 02:28:17
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0

A single inbound Telegram message was replayed into the agent session 94 times over ~2 hours, each triggering a full Sonnet API call. This resulted in ~$149 in unexpected API charges in a single day.

Error Message

  1. The agent attempts to respond but the Anthropic API returns a rate limit error (429) or overloaded error When claude-sonnet-4-6 returns a rate limit / overloaded error during response generation:

Root Cause

Root Cause (suspected)

Fix Action

Workaround

Set requireMention: true for non-essential groups to reduce blast radius, and remove Opus from fallback chain to prevent cost amplification during retry storms.

RAW_BUFFERClick to expand / collapse

Summary

A single inbound Telegram message was replayed into the agent session 94 times over ~2 hours, each triggering a full Sonnet API call. This resulted in ~$149 in unexpected API charges in a single day.

Reproduction

  1. A Telegram group message arrives in a group handled by cfd-ea agent
  2. The agent attempts to respond but the Anthropic API returns a rate limit error (429) or overloaded error
  3. Instead of discarding/acknowledging the failed delivery, the gateway re-queues the same inbound message for reprocessing
  4. This creates a retry loop where the same message_id is injected into the session repeatedly

Evidence

From session transcript analysis:

  • message_id: 15121 (Telegram) was received 94 times in session f742bae8
  • All 94 injections had identical inner timestamp: "Wed 2026-04-01 10:06 GMT+8"
  • Outer timestamps spanned 02:06:48 → 04:05:23 UTC (~2 hours)
  • Each retry triggered a new claude-sonnet-4-6 API call
  • Total session: 468 Sonnet calls from this one message

Root Cause (suspected)

When claude-sonnet-4-6 returns a rate limit / overloaded error during response generation:

  1. The delivery failure is treated as an inbound processing failure
  2. The original inbound message is re-queued for retry
  3. No deduplication check on message_id before re-injection
  4. No exponential backoff cap / max retry count on inbound message redelivery

Impact

  • ~$149 USD in a single day (normally ~$5-10/day)
  • 4,706 Sonnet API calls in one day
  • API rate limits hit repeatedly, causing cascading failures across all other sessions
  • Gateway instability / restart

Expected Behavior

  • Inbound messages should be deduplicated by message_id before re-injection into a session
  • Retry attempts on failed responses should use exponential backoff with a hard cap (e.g., max 3 retries)
  • Failed delivery should not cause the original inbound message to be reprocessed

Environment

  • OpenClaw version: 2026.3.7 (42a1394)
  • Channel: Telegram
  • Agent: cfd-ea (model: anthropic/claude-sonnet-4-6)
  • OS: Linux 6.6.87.2-microsoft-standard-WSL2 (x64)
  • Node: v22.22.0

Workaround

Set requireMention: true for non-essential groups to reduce blast radius, and remove Opus from fallback chain to prevent cost amplification during retry storms.

extent analysis

TL;DR

Implement deduplication by message_id and exponential backoff with a retry cap to prevent replayed messages from triggering repeated API calls.

Guidance

  • Verify the suspected root cause by analyzing the session transcript for repeated message_id values and corresponding API calls.
  • Implement a deduplication check on message_id before re-injecting messages into a session to prevent replay attacks.
  • Introduce exponential backoff with a hard cap (e.g., max 3 retries) for retry attempts on failed responses to prevent cascading failures.
  • Consider setting requireMention: true for non-essential groups as a temporary workaround to reduce the blast radius.

Example

No code snippet is provided due to the lack of specific implementation details in the issue.

Notes

The provided workaround may not completely resolve the issue but can help mitigate its impact. A more permanent solution would involve modifying the message processing logic to handle rate limit errors and overloaded responses more robustly.

Recommendation

Apply the workaround by setting requireMention: true for non-essential groups and removing Opus from the fallback chain to reduce cost amplification during retry storms, while working on a more permanent solution to implement deduplication and exponential backoff.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Retry storm: same inbound message replayed 94+ times causing excessive API usage ($149 in one day) [1 participants]