openclaw - 💡(How to fix) Fix [Feature]: expose channels.start / stop / restart via CLI so a wedged channel can recover without container restart or QR re-pair [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#75153Fetched 2026-05-01 05:37:37
View on GitHub
Comments
1
Participants
2
Timeline
4
Reactions
2
Timeline (top)
commented ×1cross-referenced ×1mentioned ×1subscribed ×1

Error Message

  • WhatsApp default: enabled, configured, linked, stopped, disconnected, in:8h ago, out:8h ago, dm:pairing, error:{"error":{"data":{"errno":-3001,"code":"EAI_AGAIN","syscall":"getaddrinfo", "hostname":"web.whatsapp.com"},"isBoom":true,"isServer":false,"output": {"statusCode":408,"payload":{"statusCode":408,"error":"Request Time-out", "message":"WebSocket Error (getaddrinfo EAI_AGAIN web.whatsapp.com)"}, "headers":{}}},"date":"2026-04-30T10:54:14.271Z"}

Fix Action

Fix / Workaround

  1. channels login — re-pair via QR. Destroys the existing valid Baileys session. Requires the operator's phone. Heavy.
  2. channels remove + channels add — heavier still; fully reconfigures the account.
  3. Container restart (docker restart). Has known issues on 2026.4.26 (hangs in futex_wait_queue indefinitely; we wrote up the rollback workaround in our internal docs).
  4. Toggle channels.<x>.accounts.<y>.enabled in openclaw.json. We've observed this gets debounced and the listener doesn't actually bounce.

Code Example

- WhatsApp default: enabled, configured, linked, stopped, disconnected,
  in:8h ago, out:8h ago, dm:pairing,
  error:{"error":{"data":{"errno":-3001,"code":"EAI_AGAIN","syscall":"getaddrinfo",
  "hostname":"web.whatsapp.com"},"isBoom":true,"isServer":false,"output":
  {"statusCode":408,"payload":{"statusCode":408,"error":"Request Time-out",
  "message":"WebSocket Error (getaddrinfo EAI_AGAIN web.whatsapp.com)"},
  "headers":{}}},"date":"2026-04-30T10:54:14.271Z"}

---

openclaw channels --help
Commands:
  add           Add or update a channel account
  capabilities  Show provider capabilities (intents/scopes + supported features)
  list          List configured channels + auth profiles
  login         Link a channel account (if supported)
  logout        Log out of a channel session (if supported)
  logs          Show recent channel logs from the gateway log file
  remove        Disable or delete a channel account
  resolve       Resolve channel/user names to IDs
  status        Show gateway channel status

---

"channels.start": async ({ params, respond, context }) => {
    if (!validateChannelsStartParams(params)) {
        respond(false, void 0, errorShape(ErrorCodes.INVALID_REQUEST, ...));
        return;
    }
    const channelId = normalizeChannelId(params.channel);
    if (!channelId) { ... }
    const plugin = getChannelPlugin(channelId);
    if (!plugin) { ... }
    if (!plugin.gateway?.startAccount) { ... }
    // ...
}

---

async function reconcileGatewayRuntimeAfterLocalLogin(params) {
    if (!params.plugin.gateway?.startAccount) return;
    // ...
    await callGateway({
        config: params.cfg,
        method: "channels.start",
        params: { channel: params.channelId, accountId: params.accountId },
        mode: GATEWAY_CLIENT_MODES.BACKEND,
        clientName: GATEWAY_CLIENT_NAMES.GATEWAY_CLIENT,
        deviceIdentity: null,
    });
}

---

import { r as callGateway } from "/app/dist/call-DkJ_z7yy.js";
import { readFileSync } from "node:fs";

const cfg = JSON.parse(readFileSync("/home/node/.openclaw/openclaw.json", "utf8"));
const result = await callGateway({
  config: cfg,
  method: "channels.start",
  params: { channel: "whatsapp", accountId: "default" },
  mode: "backend",
  clientName: "gateway-client",
  deviceIdentity: null,
});
// → { channel: "whatsapp", accountId: "default", started: true }

---

- WhatsApp default: enabled, configured, linked, running, connected, in:just now, out:just now

---

openclaw channels start --channel <name> [--account <id>]
openclaw channels stop --channel <name> [--account <id>]
openclaw channels restart --channel <name> [--account <id>]

---

408 Request Time-out
428 Precondition Required
getaddrinfo EAI_AGAIN web.whatsapp.com
health-monitor: restarting (reason: disconnected)
channel stop exceeded 5000ms after abort; continuing shutdown
RAW_BUFFERClick to expand / collapse

TL;DR

When a channel transiently fails (e.g., DNS blip → EAI_AGAIN web.whatsapp.com → 408/428), it lands in stopped, disconnected state and does not auto-recover even after the network is healthy again. The gateway implements channels.start as a JSON-RPC method, and that method works perfectly — but it isn't exposed via the CLI. The only published CLI path that triggers it is openclaw channels login, which forces a QR re-pair (destroys the existing valid Baileys session).

We hit this on OpenClaw 2026.4.25 with a single-tenant gateway-as-foreground container. Discord stayed healthy throughout. WhatsApp went stopped, disconnected 8 hours before discovery, with the underlying linked state preserved (creds.json still valid).

Repro

  1. OpenClaw 2026.4.25 running with WhatsApp channel enabled, configured, linked, running, connected.
  2. Network blip on web.whatsapp.com resolution (we saw getaddrinfo EAI_AGAIN, then HTTP 408/428).
  3. Health monitor logs restarting (reason: disconnected), channel stop exceeded 5000ms after abort; continuing shutdown.
  4. gateway.channelMaxRestartsPerHour budget exhausts.
  5. Channel transitions to stopped, disconnected and stays there forever.
  6. Network recovers. Channel doesn't notice. openclaw channels status --probe:
- WhatsApp default: enabled, configured, linked, stopped, disconnected,
  in:8h ago, out:8h ago, dm:pairing,
  error:{"error":{"data":{"errno":-3001,"code":"EAI_AGAIN","syscall":"getaddrinfo",
  "hostname":"web.whatsapp.com"},"isBoom":true,"isServer":false,"output":
  {"statusCode":408,"payload":{"statusCode":408,"error":"Request Time-out",
  "message":"WebSocket Error (getaddrinfo EAI_AGAIN web.whatsapp.com)"},
  "headers":{}}},"date":"2026-04-30T10:54:14.271Z"}
  1. From inside the same container: web.whatsapp.com:443 returns HTTP 200 in 0.35s. Network is fine. The channel is stuck.

What CLI offers today

openclaw channels --help
Commands:
  add           Add or update a channel account
  capabilities  Show provider capabilities (intents/scopes + supported features)
  list          List configured channels + auth profiles
  login         Link a channel account (if supported)
  logout        Log out of a channel session (if supported)
  logs          Show recent channel logs from the gateway log file
  remove        Disable or delete a channel account
  resolve       Resolve channel/user names to IDs
  status        Show gateway channel status

No start, stop, or restart. The only operator-facing recovery options published are:

  1. channels login — re-pair via QR. Destroys the existing valid Baileys session. Requires the operator's phone. Heavy.
  2. channels remove + channels add — heavier still; fully reconfigures the account.
  3. Container restart (docker restart). Has known issues on 2026.4.26 (hangs in futex_wait_queue indefinitely; we wrote up the rollback workaround in our internal docs).
  4. Toggle channels.<x>.accounts.<y>.enabled in openclaw.json. We've observed this gets debounced and the listener doesn't actually bounce.

So in practice, the operator's safe options are "QR re-pair" or "full container restart with risk of 2026.4.26 hang." Neither is appropriate for a transient network blip on a healthy paired session.

What the gateway actually exposes

The gateway implements channels.start as a JSON-RPC method in /app/dist/server-plugin-bootstrap-*.js:

"channels.start": async ({ params, respond, context }) => {
    if (!validateChannelsStartParams(params)) {
        respond(false, void 0, errorShape(ErrorCodes.INVALID_REQUEST, ...));
        return;
    }
    const channelId = normalizeChannelId(params.channel);
    if (!channelId) { ... }
    const plugin = getChannelPlugin(channelId);
    if (!plugin) { ... }
    if (!plugin.gateway?.startAccount) { ... }
    // ...
}

The CLI calls it from one place — reconcileGatewayRuntimeAfterLocalLogin, which is invoked as a side-effect of channels login:

async function reconcileGatewayRuntimeAfterLocalLogin(params) {
    if (!params.plugin.gateway?.startAccount) return;
    // ...
    await callGateway({
        config: params.cfg,
        method: "channels.start",
        params: { channel: params.channelId, accountId: params.accountId },
        mode: GATEWAY_CLIENT_MODES.BACKEND,
        clientName: GATEWAY_CLIENT_NAMES.GATEWAY_CLIENT,
        deviceIdentity: null,
    });
}

So we already have the RPC. We just don't have a CLI command that fires it standalone, against an existing-and-still-linked account, without re-pair.

How we recovered today

We wrote a small Node script that imports the CLI's callGateway from the bundle and fires channels.start directly. Working invocation:

import { r as callGateway } from "/app/dist/call-DkJ_z7yy.js";
import { readFileSync } from "node:fs";

const cfg = JSON.parse(readFileSync("/home/node/.openclaw/openclaw.json", "utf8"));
const result = await callGateway({
  config: cfg,
  method: "channels.start",
  params: { channel: "whatsapp", accountId: "default" },
  mode: "backend",
  clientName: "gateway-client",
  deviceIdentity: null,
});
// → { channel: "whatsapp", accountId: "default", started: true }

We exec'd this inside openclaw-openclaw-gateway-1 and within ~3 seconds:

- WhatsApp default: enabled, configured, linked, running, connected, in:just now, out:just now

Round-trip messaging restored immediately.

Proposed fix

Add three CLI subcommands that front the existing gateway methods:

openclaw channels start --channel <name> [--account <id>]
openclaw channels stop --channel <name> [--account <id>]
openclaw channels restart --channel <name> [--account <id>]

restart is stop then start. start should refuse if the channel state is already running (or accept --force to no-op). stop should be graceful (call gateway-side stopAccount if it exists, else equivalent).

This gives operators a non-destructive recovery path between "ignore" and "QR re-pair / container restart."

Secondary observation (separate concern)

The fact that a channel exhausts channelMaxRestartsPerHour and then gives up forever — even when the underlying transport recovers — is an architectural decision worth revisiting. We'd suggest one of:

  • After max-retries-per-hour, switch to a back-off-and-keep-checking mode rather than full stop.
  • Surface the give-up state as a high-severity health event so operators get notified instead of discovering it 8 hours later.
  • Add a --auto-restart-on-network-recovery flag the operator can opt into.

Happy to file that as a separate issue if you'd like; we wanted to surface it here for context since it's the upstream cause that makes the missing channels start CLI command load-bearing.

Environment

  • OpenClaw 2026.4.25 (rolled back from 2026.4.26 due to hang-on-restart in 2026.4.26 — separate issue worth filing if not already on file)
  • Single-tenant; gateway-as-foreground container; Docker on macOS host
  • Channel state at the time of failure: enabled, configured, linked, stopped, disconnected, dm:pairing
  • Network: pfSense allowlist for DMZ → MetaWhatsApp. Everything intact and verified in-container post-recovery.

Useful for triage

Logs around the failure window contained:

408 Request Time-out
428 Precondition Required
getaddrinfo EAI_AGAIN web.whatsapp.com
health-monitor: restarting (reason: disconnected)
channel stop exceeded 5000ms after abort; continuing shutdown

The channel-stop exceeding 5000ms-after-abort is independently interesting — that's the channel's own teardown getting stuck while the rest of the channel-runtime moves on. May be a related teardown-doesn't-actually-tear-down bug.

extent analysis

TL;DR

The proposed fix is to add a channels start CLI command that fronts the existing channels.start gateway method to allow operators a non-destructive recovery path for transient network failures.

Guidance

  • Implement the proposed channels start CLI command to call the existing channels.start gateway method, allowing operators to restart a channel without re-pairing or restarting the container.
  • Consider adding --force option to the start command to handle cases where the channel state is already running.
  • Review the channel restart logic to prevent exhausting channelMaxRestartsPerHour and giving up forever, potentially by implementing a back-off-and-keep-checking mode or surfacing the give-up state as a high-severity health event.

Example

// Example implementation of the channels start CLI command
async function startChannel(channel, accountId) {
  const result = await callGateway({
    config: cfg,
    method: "channels.start",
    params: { channel, accountId },
    mode: "backend",
    clientName: "gateway-client",
    deviceIdentity: null,
  });
  return result;
}

Notes

The proposed fix assumes that the channels.start gateway method is correctly implemented and functional. Additionally, the fix does not address the underlying issue of the channel exhausting channelMaxRestartsPerHour and giving up forever, which may require further investigation and changes to the channel restart logic.

Recommendation

Apply the proposed fix by adding the channels start CLI command to front the existing channels.start gateway method, allowing operators a non-destructive recovery path for transient network failures. This will provide a more robust and reliable way to recover from channel failures without requiring re-pairing or restarting the container.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Feature]: expose channels.start / stop / restart via CLI so a wedged channel can recover without container restart or QR re-pair [1 comments, 2 participants]