openclaw - ✅(Solved) Fix Disconnected Node causes false 'Rate Limit' errors (node.invoke timeout misclassified) [1 pull requests, 1 comments, 2 participants]

fabudde · 2026-04-01T09:21:22Z

[openclaw] When a paired node e.g. macOS companion app is paired but not connected , OpenClaw attempts node.invoke calls that time out after 30 seconds. The re… When a paired node (e.g. macOS companion app) is **paired but not connected**, OpenClaw attempts `node.invoke` calls that time out after 30 seconds. The resulting timeout error is surfaced to the user as **"⚠️ API rate limit reached"** — a completely misleading error message. The actual problem has nothing to do with API rate limits. # PR #58946: Fix node.invoke timeout misclassified as API rate limit - Repository: openclaw/openclaw - Author: qkal - State: open | merged: False - Link: https://github.com/openclaw/openclaw/pull/58946 ## Description (problem / solution / changelog) ## Summary - prioritize node connectivity timeout/unavailable errors over generic mid-turn rate-limit copy - add explicit node-unavailable user-facing fallback text when node.invoke timeouts/disconnects are detected - keep existing rate-limit/overloaded behavior for non-node errors ## Why Issue #58903 reports disconnected paired nodes (node.invoke timeout/UNAVAILABLE) surfacing as API rate limit reached, which is misleading and sends debugging in the wrong direction. ## Tests - pnpm exec vitest run src/auto-reply/reply/agent-runner-execution.test.ts - pnpm exec vitest run src/auto-reply/reply/agent-runner.misc.runreplyagent.test.ts -t "mid-turn rate-limit fallback" ## Changed files - `src/auto-reply/reply/agent-runner-execution.test.ts` (modified, +98/-0) - `src/auto-reply/reply/agent-runner-execution.ts` (modified, +73/-8) - `src/auto-reply/reply/agent-runner.misc.runreplyagent.test.ts` (modified, +19/-0) ## Workaround Users can work around this by: 1. Opening the OpenClaw companion app on the paired device 2. Or unpairing the node: remove it from `~/.openclaw/devices/paired.json` 3. Or restarting the gateway after ensuring the node is connected # Bug Report: Disconnected Node Causes False "Rate Limit" Errors ## Summary When a paired node (e.g. macOS companion app) is **paired but not connected**, OpenClaw attempts `node.invoke` calls that time out after 30 seconds. The resulting timeout error is surfaced to the user as **"⚠️ API rate limit reached"** — a completely misleading error message. The actual problem has nothing to do with API rate limits. ## Environment - OpenClaw: 2026.3.28 / 2026.3.31 - Platform: Linux VPS (Hetzner) - Node: macOS companion app (paired, disconnected) - Channel: Discord ## Steps to Reproduce 1. Pair a macOS node via OpenClaw companion app 2. Disconnect the node (close app, sleep laptop, leave network) 3. Send a message to the agent in any Discord channel 4. Agent attempts `node.invoke` → 30s timeout → error displayed as "rate limit" ## Expected Behavior - If a node is paired but disconnected, `node.invoke` should either: - Skip gracefully with a warning ("Node offline, skipping") - Fail fast (< 1 second) instead of waiting 30 seconds - Show the real error: "Node 'Alex MacBook' is not connected" instead of "rate limit reached" ## Actual Behavior ``` ⇄ res ✗ node.invoke 30072ms errorCode=UNAVAILABLE errorMessage=TIMEOUT: node invoke timed out ⇄ res ✗ node.invoke 30001ms errorCode=UNAVAILABLE errorMessage=TIMEOUT: node invoke timed out ⇄ res ✗ node.invoke 30000ms errorCode=UNAVAILABLE errorMessage=TIMEOUT: node invoke timed out ``` The user sees: ``` ⚠️ API rate limit reached — the model couldn't generate a response. Please try again in a moment. ``` ## Impact ### Misdiagnosis cascade This bug caused **hours of debugging** across two servers (Nyx + Tyto) because the error message pointed at Anthropic rate limits. We: - Restarted gateways multiple times - Compared auth profiles between servers - Spawned 4 sub-agents to investigate API keys - Tested direct curl to Anthropic API - Removed/re-added auth tokens - Updated OpenClaw versions All because the error said "rate limit" when the real problem was a disconnected MacBook. ### Retry storm amplification When `node.invoke` times out and is misclassified as a rate limit: 1. The cooldown system marks the auth profile as rate-limited 2. Failover rotates to next auth profile 3. Next profile also hits `node.invoke` timeout 4. All profiles get marked as rate-limited 5. Gateway enters retry loop → burns through actual Anthropic rate limits with retries 6. Now there ARE real rate limits on top of the node timeout 7. The problem becomes self-reinforcing and nearly impossible to diagnose ### Multi-session impact - Channel sessions that trigger node.invoke → broken (30s timeout → "rate limit") - Channel sessions that DON'T trigger node.invoke → work fine - This creates the confusing situation where "the same agent works in one channel but not another" with the same API key ## Root Cause Analysis The `node.invoke` timeout error (`errorCode=UNAVAILABLE`) is caught by a generic error handler that maps it to a "rate limit" user-facing message. The error classification does not distinguish between: - Actual Anthropic

openclaw2026-04-01 09:21:22

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#58903•Fetched 2026-04-08 02:31:20

View on GitHub

Comments

Participants

Timeline

Reactions

Author

fabudde

Participants

fabudde

qkal

Timeline (top)

commented ×1cross-referenced ×1

When a paired node (e.g. macOS companion app) is paired but not connected, OpenClaw attempts node.invoke calls that time out after 30 seconds. The resulting timeout error is surfaced to the user as "⚠️ API rate limit reached" — a completely misleading error message. The actual problem has nothing to do with API rate limits.

Error Message

Show the real error: "Node 'Alex MacBook' is not connected" instead of "rate limit reached" This bug caused hours of debugging across two servers (Nyx + Tyto) because the error message pointed at Anthropic rate limits. We: All because the error said "rate limit" when the real problem was a disconnected MacBook. The node.invoke timeout error (errorCode=UNAVAILABLE) is caught by a generic error handler that maps it to a "rate limit" user-facing message. The error classification does not distinguish between:

P0: Fix error message

Map errorCode=UNAVAILABLE + TIMEOUT: node invoke to a specific, helpful error message: Node status at time of error: User-visible error (misleading):

Root Cause

The node.invoke timeout error (errorCode=UNAVAILABLE) is caught by a generic error handler that maps it to a "rate limit" user-facing message. The error classification does not distinguish between:

Actual Anthropic 429 rate limits
Node invoke timeouts (disconnected device)
Network errors
Other transient failures

Fix Action

Workaround

Users can work around this by:

Opening the OpenClaw companion app on the paired device
Or unpairing the node: remove it from ~/.openclaw/devices/paired.json
Or restarting the gateway after ensuring the node is connected

PR fix notes

PR #58946: Fix node.invoke timeout misclassified as API rate limit

Repository: openclaw/openclaw
Author: qkal
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/58946

Description (problem / solution / changelog)

Summary

prioritize node connectivity timeout/unavailable errors over generic mid-turn rate-limit copy
add explicit node-unavailable user-facing fallback text when node.invoke timeouts/disconnects are detected
keep existing rate-limit/overloaded behavior for non-node errors

Why

Issue #58903 reports disconnected paired nodes (node.invoke timeout/UNAVAILABLE) surfacing as API rate limit reached, which is misleading and sends debugging in the wrong direction.

Tests

pnpm exec vitest run src/auto-reply/reply/agent-runner-execution.test.ts
pnpm exec vitest run src/auto-reply/reply/agent-runner.misc.runreplyagent.test.ts -t "mid-turn rate-limit fallback"

Changed files

src/auto-reply/reply/agent-runner-execution.test.ts (modified, +98/-0)
src/auto-reply/reply/agent-runner-execution.ts (modified, +73/-8)
src/auto-reply/reply/agent-runner.misc.runreplyagent.test.ts (modified, +19/-0)

Code Example

⇄ res ✗ node.invoke 30072ms errorCode=UNAVAILABLE errorMessage=TIMEOUT: node invoke timed out
⇄ res ✗ node.invoke 30001ms errorCode=UNAVAILABLE errorMessage=TIMEOUT: node invoke timed out
⇄ res ✗ node.invoke 30000ms errorCode=UNAVAILABLE errorMessage=TIMEOUT: node invoke timed out

---

⚠️ API rate limit reached — the model couldn't generate a response. Please try again in a moment.

---

⚠️ Node "Alex MacBook" is not connected. Please open the OpenClaw app on your device, or unpair the node.

---

📱 Nodes: Alex MacBook (⚠️ disconnected)

---

09:29:35 ⇄ res ✗ node.invoke 30072ms errorCode=UNAVAILABLE errorMessage=TIMEOUT
09:30:26 ⇄ res ✗ node.invoke 30001ms errorCode=UNAVAILABLE errorMessage=TIMEOUT
09:31:13 ⇄ res ✗ node.invoke 30000ms errorCode=UNAVAILABLE errorMessage=TIMEOUT
09:32:19 ⇄ res ✗ node.invoke 30001ms errorCode=UNAVAILABLE errorMessage=TIMEOUT

---

{
  "displayName": "Alex MacBook",
  "platform": "darwin",
  "paired": true,
  "connected": false
}

---

⚠️ API rate limit reached — the model couldn't generate a response.

RAW_BUFFERClick to expand / collapse

Bug Report: Disconnected Node Causes False "Rate Limit" Errors

Summary

Environment

OpenClaw: 2026.3.28 / 2026.3.31
Platform: Linux VPS (Hetzner)
Node: macOS companion app (paired, disconnected)
Channel: Discord

Steps to Reproduce

Pair a macOS node via OpenClaw companion app
Disconnect the node (close app, sleep laptop, leave network)
Send a message to the agent in any Discord channel
Agent attempts node.invoke → 30s timeout → error displayed as "rate limit"

Expected Behavior

If a node is paired but disconnected, node.invoke should either:
- Skip gracefully with a warning ("Node offline, skipping")
- Fail fast (< 1 second) instead of waiting 30 seconds
- Show the real error: "Node 'Alex MacBook' is not connected" instead of "rate limit reached"

Actual Behavior

⇄ res ✗ node.invoke 30072ms errorCode=UNAVAILABLE errorMessage=TIMEOUT: node invoke timed out
⇄ res ✗ node.invoke 30001ms errorCode=UNAVAILABLE errorMessage=TIMEOUT: node invoke timed out
⇄ res ✗ node.invoke 30000ms errorCode=UNAVAILABLE errorMessage=TIMEOUT: node invoke timed out

The user sees:

⚠️ API rate limit reached — the model couldn't generate a response. Please try again in a moment.

Impact

Misdiagnosis cascade

This bug caused hours of debugging across two servers (Nyx + Tyto) because the error message pointed at Anthropic rate limits. We:

Restarted gateways multiple times
Compared auth profiles between servers
Spawned 4 sub-agents to investigate API keys
Tested direct curl to Anthropic API
Removed/re-added auth tokens
Updated OpenClaw versions

All because the error said "rate limit" when the real problem was a disconnected MacBook.

Retry storm amplification

When node.invoke times out and is misclassified as a rate limit:

The cooldown system marks the auth profile as rate-limited
Failover rotates to next auth profile
Next profile also hits node.invoke timeout
All profiles get marked as rate-limited
Gateway enters retry loop → burns through actual Anthropic rate limits with retries
Now there ARE real rate limits on top of the node timeout
The problem becomes self-reinforcing and nearly impossible to diagnose

Multi-session impact

Channel sessions that trigger node.invoke → broken (30s timeout → "rate limit")
Channel sessions that DON'T trigger node.invoke → work fine
This creates the confusing situation where "the same agent works in one channel but not another" with the same API key

Root Cause Analysis

Actual Anthropic 429 rate limits
Node invoke timeouts (disconnected device)
Network errors
Other transient failures

Suggested Fixes

P0: Fix error message

Map errorCode=UNAVAILABLE + TIMEOUT: node invoke to a specific, helpful error message:

⚠️ Node "Alex MacBook" is not connected. Please open the OpenClaw app on your device, or unpair the node.

P1: Fast-fail for disconnected nodes

Before attempting node.invoke, check node.connected status. If connected: false, skip immediately instead of waiting 30 seconds.

P2: Don't classify node timeouts as rate limits

Node invoke timeouts should NOT trigger the auth profile cooldown system. They have nothing to do with API key rate limits.

P3: Add node status to /status output

Show paired node connection status in /status so users can quickly identify disconnected nodes:

📱 Nodes: Alex MacBook (⚠️ disconnected)

Workaround

Users can work around this by:

Opening the OpenClaw companion app on the paired device
Or unpairing the node: remove it from ~/.openclaw/devices/paired.json
Or restarting the gateway after ensuring the node is connected

Evidence

Log excerpts from Tyto's server (2026-04-01):

09:29:35 ⇄ res ✗ node.invoke 30072ms errorCode=UNAVAILABLE errorMessage=TIMEOUT
09:30:26 ⇄ res ✗ node.invoke 30001ms errorCode=UNAVAILABLE errorMessage=TIMEOUT
09:31:13 ⇄ res ✗ node.invoke 30000ms errorCode=UNAVAILABLE errorMessage=TIMEOUT
09:32:19 ⇄ res ✗ node.invoke 30001ms errorCode=UNAVAILABLE errorMessage=TIMEOUT

Node status at time of error:

{
  "displayName": "Alex MacBook",
  "platform": "darwin",
  "paired": true,
  "connected": false
}

User-visible error (misleading):

⚠️ API rate limit reached — the model couldn't generate a response.

Filed by Tyto 🦉 — April 1, 2026 Discovered while debugging Nyx (Round 6) + Tyto #tyto channel failures

extent analysis

TL;DR

The most likely fix for the "Rate Limit" error caused by a disconnected node is to update the error handling to distinguish between actual rate limits and node invoke timeouts, and to provide a more informative error message to the user.

Guidance

Implement a specific error message for errorCode=UNAVAILABLE and TIMEOUT: node invoke to inform the user that the node is not connected.
Add a check for node.connected status before attempting node.invoke to skip immediately if the node is disconnected.
Modify the auth profile cooldown system to exclude node invoke timeouts, as they are not related to API key rate limits.
Consider adding node status to the /status output to help users quickly identify disconnected nodes.

Example

A possible implementation of the updated error handling could be:

if (errorCode === 'UNAVAILABLE' && errorMessage.includes('TIMEOUT: node invoke')) {
  // Display a specific error message to the user
  console.log(`⚠️ Node "${nodeName}" is not connected. Please open the OpenClaw app on your device, or unpair the node.`);
} else {
  // Handle other error cases
}

Notes

The provided solution focuses on updating the error handling and providing more informative error messages to the user. However, additional changes may be necessary to fully resolve the issue, such as modifying the node invoke timeout duration or implementing a retry mechanism.

Recommendation

Apply the suggested fixes, starting with updating the error message and adding a check for node.connected status, to provide a more accurate and helpful error message to the user and prevent misdiagnosis of the issue.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #API rate limit #docker error #permission error #memory optimization

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix Disconnected Node causes false 'Rate Limit' errors (node.invoke timeout misclassified) [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

P0: Fix error message

Root Cause

Fix Action

Workaround

PR fix notes

PR #58946: Fix node.invoke timeout misclassified as API rate limit

Description (problem / solution / changelog)

Summary

Why

Tests

Changed files

Code Example

Bug Report: Disconnected Node Causes False "Rate Limit" Errors

Summary

Environment

Steps to Reproduce

Expected Behavior

Actual Behavior

Impact

Misdiagnosis cascade

Retry storm amplification

Multi-session impact

Root Cause Analysis

Suggested Fixes

P0: Fix error message

P1: Fast-fail for disconnected nodes

P2: Don't classify node timeouts as rate limits

P3: Add node status to /status output

Workaround

Evidence

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING