openclaw - 💡(How to fix) Fix Empty claude-cli subprocess responses misclassified as billing cooldown

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

When the bundled Claude CLI subprocess returns a zero-token, no-text completion (no error, no abort, no timeout), the provider classifier records it as a billing failure in ~/.openclaw/agents/main/agent/auth-state.json. Three such responses trip a cooldown on the profile, after which every subsequent run on that profile aborts in ~300 ms with no model call and no trajectory file.

This is not a real billing/wallet condition — the user's Claude account is funded and other clients on the same account succeed.

Error Message

FallbackSummaryError: All models failed (N): claude-cli/claude-sonnet-4-6: Provider claude-cli is in cooldown (suspending lanes) (billing) | anthropic/claude-haiku-4-5-...: Provider anthropic is in cooldown (suspending lanes) (billing)

Root Cause

When the bundled Claude CLI subprocess returns a zero-token, no-text completion (no error, no abort, no timeout), the provider classifier records it as a billing failure in ~/.openclaw/agents/main/agent/auth-state.json. Three such responses trip a cooldown on the profile, after which every subsequent run on that profile aborts in ~300 ms with no model call and no trajectory file.

This is not a real billing/wallet condition — the user's Claude account is funded and other clients on the same account succeed.

Fix Action

Fix / Workaround

Workaround attempted

Code Example

"usageStats": {
  "anthropic:claude-cli": {
    "errorCount": 1,
    "failureCounts": { "billing": 1 },
    "lastFailureAt": <ts>
  }
}

---

lastCallUsage: { input_tokens: 0, output_tokens: 0, total_tokens: 0 }
assistantTexts: []
aborted: false
timedOut: false
promptErrorSource: null

---

"disabledUntil": <future-ts>,
"disabledReason": "billing",
"errorCount": 3,
"failureCounts": { "billing": 3 }

---

FallbackSummaryError: All models failed (N):
  claude-cli/claude-sonnet-4-6: Provider claude-cli is in cooldown (suspending lanes) (billing)
  | anthropic/claude-haiku-4-5-...: Provider anthropic is in cooldown (suspending lanes) (billing)

---

ExecStartPre=/usr/bin/python3 -c "import json,os;f=os.path.expanduser('~/.openclaw/agents/main/agent/auth-state.json');d=json.load(open(f));[s.pop('disabledUntil',None) or s.pop('disabledReason',None) or s.update(errorCount=0,failureCounts={}) for s in d.get('usageStats',{}).values()];json.dump(d,open(f,'w'),indent=2)"
RAW_BUFFERClick to expand / collapse

Version

OpenClaw 2026.5.12 (build f066dd2), Node claude-cli provider, Linux.

Summary

When the bundled Claude CLI subprocess returns a zero-token, no-text completion (no error, no abort, no timeout), the provider classifier records it as a billing failure in ~/.openclaw/agents/main/agent/auth-state.json. Three such responses trip a cooldown on the profile, after which every subsequent run on that profile aborts in ~300 ms with no model call and no trajectory file.

This is not a real billing/wallet condition — the user's Claude account is funded and other clients on the same account succeed.

Reproduction signature

After a run completes with status: "success" but an empty assistant reply, auth-state.json shows:

"usageStats": {
  "anthropic:claude-cli": {
    "errorCount": 1,
    "failureCounts": { "billing": 1 },
    "lastFailureAt": <ts>
  }
}

The corresponding trajectory's model.completed event has:

lastCallUsage: { input_tokens: 0, output_tokens: 0, total_tokens: 0 }
assistantTexts: []
aborted: false
timedOut: false
promptErrorSource: null

After three such events, the profile is cooled down:

"disabledUntil": <future-ts>,
"disabledReason": "billing",
"errorCount": 3,
"failureCounts": { "billing": 3 }

Subsequent runs fail in ~300 ms with:

FallbackSummaryError: All models failed (N):
  claude-cli/claude-sonnet-4-6: Provider claude-cli is in cooldown (suspending lanes) (billing)
  | anthropic/claude-haiku-4-5-...: Provider anthropic is in cooldown (suspending lanes) (billing)

and no <sessionId>.jsonl / .trajectory.jsonl is created — the run aborts before any model call.

Expected

A zero-token, no-text, no-error response from the Claude CLI subprocess should not be classified as a billing failure. Either:

  1. Treat empty completions as a distinct failure mode (e.g. empty-response) and apply a separate cooldown policy, or
  2. Don't increment any failure counter for empty responses (treat as a no-op / retry candidate), or
  3. Inspect the CLI exit code and stderr before classifying — an empty stdout with exit 0 is not the same as a billing rejection.

Actual

The classifier maps the empty response to billing, the cooldown trips after 3 consecutive empty responses (which happen organically during normal Discord/cron load), and all dependent jobs fail until the cooldown window elapses or the state file is manually edited.

Workaround attempted

Adding a systemd ExecStartPre that clears disabledUntil / disabledReason / errorCount / failureCounts from auth-state.json on gateway start works at startup, but is not sufficient — during a 16-minute window between gateway restart and a manual cron trigger, two ordinary Discord/cron sessions re-tripped failureCounts.billing to 3.

ExecStartPre snippet (for reference):

ExecStartPre=/usr/bin/python3 -c "import json,os;f=os.path.expanduser('~/.openclaw/agents/main/agent/auth-state.json');d=json.load(open(f));[s.pop('disabledUntil',None) or s.pop('disabledReason',None) or s.update(errorCount=0,failureCounts={}) for s in d.get('usageStats',{}).values()];json.dump(d,open(f,'w'),indent=2)"

Suggested fix locations

The classifier path that maps subprocess result → failure category should distinguish between:

  • non-zero exit / stderr containing a billing-shaped signal → billing (current behaviour, correct)
  • exit 0 with empty stdout / usage.total_tokens == 0 and assistantTexts == []new bucket (or no-op)

Impact

Every cron job and Discord session that depends on anthropic:claude-cli becomes unreliable after ~3 empty-response coincidences. Operators see this as "claude-cli is dead" or "billing issue" with no actual billing problem. The cooldown is invisible from openclaw health (it shows Discord: configured and gateway healthy) and only visible by reading auth-state.json directly or seeing the FallbackSummaryError in openclaw cron runs.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Empty claude-cli subprocess responses misclassified as billing cooldown