openclaw - 💡(How to fix) Fix Followup agent silent-drops on billing/quota rejection (no user-facing notice) [1 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

When the followup agent fails before generating a reply due to a billing/quota/rate-limit rejection from the model provider, the inbound message is silently dropped — the originating user/peer-bot sees no Slack/Telegram/etc. notice, and the inbound looks ignored. This is the worst-case failure mode for an agent loop because there is no signal that recovery is needed.

In our team's multi-agent Slack workflow this manifested as repeated "the bot is ignoring me" reports during sustained anthropic extra usage exhaustion, and required out-of-band gateway.err.log inspection to diagnose.

Error Message

A short surface notice should be posted to the originating channel/thread when the followup catch block fires for a quota/billing class error, similar to how the main agent path's surface_error decision reaches the user (we recall this being the behavior in earlier OpenClaw versions — open to correction if our memory is wrong here). defaultRuntime.error?.(Followup agent failed before reply: ${message}); rawError=400 {"type":"error","error":{"type":"invalid_request_error","message":"Third-party apps now draw from your extra usage, not your plan limits. Add more at claude.ai/settings/usage and keep going."}} defaultRuntime.error?.(Followup agent failed before reply: ${message}); defaultRuntime.error?.(Followup notice send failed: ${formatErrorMessage(noticeErr)});

  • Is there an existing surface-error path on the followup runner that we missed, which used to be wired up but regressed?

Root Cause

When the followup agent fails before generating a reply due to a billing/quota/rate-limit rejection from the model provider, the inbound message is silently dropped — the originating user/peer-bot sees no Slack/Telegram/etc. notice, and the inbound looks ignored. This is the worst-case failure mode for an agent loop because there is no signal that recovery is needed.

Fix Action

Fixed

Code Example

} catch (err) {
  const message = formatErrorMessage(err);
  replyOperation.fail("run_failed", err);
  defaultRuntime.error?.(`Followup agent failed before reply: ${message}`);
  return;
}

---

400 invalid_request_error
   "Third-party apps now draw from your extra usage, not your plan limits..."

---

2026-05-12T00:12:37.948+09:00 [agent/embedded] auth profile failure state updated: ... reason=billing
2026-05-12T00:12:37.949+09:00 [agent/embedded] embedded run failover decision: decision=surface_error reason=billing
  rawError=400 {"type":"error","error":{"type":"invalid_request_error","message":"Third-party apps now draw from your extra usage, not your plan limits. Add more at claude.ai/settings/usage and keep going."}}
2026-05-12T00:12:37.952+09:00 [model-fallback/decision] decision=candidate_failed reason=billing providerErrorType=invalid_request_error next=none
Followup agent failed before reply: LLM request rejected: Third-party apps now draw from your extra usage, not your plan limits. Add more at claude.ai/settings/usage and keep going.
2026-05-12T00:12:38.052+09:00 [model-fallback/decision] decision=skip_candidate reason=billing next=none
Followup agent failed before reply: All models failed (1): anthropic/claude-opus-4-7: Provider anthropic has billing issue (skipping all models) (billing)

---

} catch (err) {
  const message = formatErrorMessage(err);
  replyOperation.fail("run_failed", err);
  defaultRuntime.error?.(`Followup agent failed before reply: ${message}`);

  const lowerMsg = message.toLowerCase();
  const isQuotaRelated =
    lowerMsg.includes("billing") ||
    lowerMsg.includes("quota") ||
    lowerMsg.includes("rate limit") ||
    lowerMsg.includes("rate_limit") ||
    lowerMsg.includes("extra usage") ||
    lowerMsg.includes("usage limit");
  if (isQuotaRelated && run.sourceReplyDeliveryMode !== "message_tool_only") {
    try {
      await sendFollowupPayloads(
        [{ text: "⚠️ followup reply generation failed — billing/quota block. Please retry shortly." }],
        effectiveQueued,
        { provider: fallbackProvider, modelId: fallbackModel },
      );
    } catch (noticeErr) {
      defaultRuntime.error?.(`Followup notice send failed: ${formatErrorMessage(noticeErr)}`);
    }
  }
  return;
}
RAW_BUFFERClick to expand / collapse

Summary

When the followup agent fails before generating a reply due to a billing/quota/rate-limit rejection from the model provider, the inbound message is silently dropped — the originating user/peer-bot sees no Slack/Telegram/etc. notice, and the inbound looks ignored. This is the worst-case failure mode for an agent loop because there is no signal that recovery is needed.

In our team's multi-agent Slack workflow this manifested as repeated "the bot is ignoring me" reports during sustained anthropic extra usage exhaustion, and required out-of-band gateway.err.log inspection to diagnose.

Expected behavior

A short surface notice should be posted to the originating channel/thread when the followup catch block fires for a quota/billing class error, similar to how the main agent path's surface_error decision reaches the user (we recall this being the behavior in earlier OpenClaw versions — open to correction if our memory is wrong here).

Example notice:

⚠️ followup reply generation failed — billing/quota block. Please retry shortly.

Actual behavior

In src/auto-reply/reply/followup-runner.ts around line 358, the catch block:

} catch (err) {
  const message = formatErrorMessage(err);
  replyOperation.fail("run_failed", err);
  defaultRuntime.error?.(`Followup agent failed before reply: ${message}`);
  return;
}

calls replyOperation.fail("run_failed", err) and returns without sending any payload to the originating channel via sendFollowupPayloads. The surface_error decision is logged earlier in the embedded agent path, but it does not reach outbound delivery on the followup path.

Reproduction

  1. Exhaust the anthropic Pro/Max extra usage allowance so the API returns:
    400 invalid_request_error
    "Third-party apps now draw from your extra usage, not your plan limits..."
  2. Have a peer (human or bot) send an inbound mention to a bot whose followup agent path is now blocked.
  3. Observe gateway.err.log shows Followup agent failed before reply: ... billing, but the originating Slack thread receives no message at all.

Logs (excerpt — openclaw v2026.5.4, gateway.err.log)

2026-05-12T00:12:37.948+09:00 [agent/embedded] auth profile failure state updated: ... reason=billing
2026-05-12T00:12:37.949+09:00 [agent/embedded] embedded run failover decision: decision=surface_error reason=billing
  rawError=400 {"type":"error","error":{"type":"invalid_request_error","message":"Third-party apps now draw from your extra usage, not your plan limits. Add more at claude.ai/settings/usage and keep going."}}
2026-05-12T00:12:37.952+09:00 [model-fallback/decision] decision=candidate_failed reason=billing providerErrorType=invalid_request_error next=none
Followup agent failed before reply: LLM request rejected: Third-party apps now draw from your extra usage, not your plan limits. Add more at claude.ai/settings/usage and keep going.
2026-05-12T00:12:38.052+09:00 [model-fallback/decision] decision=skip_candidate reason=billing next=none
Followup agent failed before reply: All models failed (1): anthropic/claude-opus-4-7: Provider anthropic has billing issue (skipping all models) (billing)

The pattern repeats roughly once a minute while the provider remains blocked, with zero user-facing output the whole time.

Impact

  • Silent drop is the worst-case failure mode for an agent loop. Peers cannot distinguish "the agent is busy" from "the agent never received my message."
  • In a multi-agent serial workflow (e.g. owner → bot A → bot B → bot C), a single silently-dropped completion report stalls the entire pipeline indefinitely until a human notices.
  • We had to work around it locally with synchronous thread-polling (10s × 60–90s cap) on the calling side, which is fragile and only helps when the caller is healthy.

Suggested direction (non-binding)

Inside the catch block, when message matches a quota/billing/rate-limit pattern, send a short notice payload via the existing sendFollowupPayloads path (variables in scope: effectiveQueued, fallbackProvider, fallbackModel, run.sourceReplyDeliveryMode). The notice itself does not require an LLM call, so it succeeds even while the provider is blocked.

Sketch (one possible shape — happy to provide a full diff on request, or this is a hint for the maintainers):

} catch (err) {
  const message = formatErrorMessage(err);
  replyOperation.fail("run_failed", err);
  defaultRuntime.error?.(`Followup agent failed before reply: ${message}`);

  const lowerMsg = message.toLowerCase();
  const isQuotaRelated =
    lowerMsg.includes("billing") ||
    lowerMsg.includes("quota") ||
    lowerMsg.includes("rate limit") ||
    lowerMsg.includes("rate_limit") ||
    lowerMsg.includes("extra usage") ||
    lowerMsg.includes("usage limit");
  if (isQuotaRelated && run.sourceReplyDeliveryMode !== "message_tool_only") {
    try {
      await sendFollowupPayloads(
        [{ text: "⚠️ followup reply generation failed — billing/quota block. Please retry shortly." }],
        effectiveQueued,
        { provider: fallbackProvider, modelId: fallbackModel },
      );
    } catch (noticeErr) {
      defaultRuntime.error?.(`Followup notice send failed: ${formatErrorMessage(noticeErr)}`);
    }
  }
  return;
}

Open questions for maintainers:

  • Is there an existing surface-error path on the followup runner that we missed, which used to be wired up but regressed?
  • Should rate-limit and billing surface differently (e.g. retry hint vs. owner-attention hint)?
  • Would it be cleaner to surface this at the model-fallback layer's surface_error decision rather than at the followup catch?

Environment

  • OpenClaw 2026.5.4 (commit 325df3e)
  • macOS 15 (Darwin 25.4.0)
  • Slack channel, multi-agent setup (5 bots on the same host across separate macOS user accounts)
  • Model: anthropic/claude-opus-4-7

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

A short surface notice should be posted to the originating channel/thread when the followup catch block fires for a quota/billing class error, similar to how the main agent path's surface_error decision reaches the user (we recall this being the behavior in earlier OpenClaw versions — open to correction if our memory is wrong here).

Example notice:

⚠️ followup reply generation failed — billing/quota block. Please retry shortly.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING