openclaw - 💡(How to fix) Fix Channel-reload race: deferred reload + gateway-tool `restart` double-spawn telegram channel → EADDRINUSE crash loop [1 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

The auto-restart never recovers because port 8787 is held by another [default] channel that won the bind in the same process. Once the auto-restart counter hits 10/10 the dead channel stops retrying, but the surviving one continues to handle traffic — so the bug is mostly invisible (telegram in/out works) until you notice the steady error stream in logs.

  • Auto-restart on the losing channel runs for 10 attempts then stops, leaving the surviving channel still functional — so this manifests as persistent error noise more than user-visible breakage.

Root Cause

The auto-restart never recovers because port 8787 is held by another [default] channel that won the bind in the same process. Once the auto-restart counter hits 10/10 the dead channel stops retrying, but the surviving one continues to handle traffic — so the bug is mostly invisible (telegram in/out works) until you notice the steady error stream in logs.

Fix Action

Fixed

Code Example

[default] starting provider (@<bot>)
[default] channel exited: listen EADDRINUSE: address already in use 0.0.0.0:8787
[default] auto-restart attempt N/10 in <backoff>s

---

gateway/reload   config change detected; evaluating reload (channels.telegram.streaming.mode)
   gateway/reload   config change requires channel reload (telegram) — deferring until N operation(s), M reply(ies), K embedded run(s) complete

---

gateway-tool     gateway tool: restart requested (delayMs=default, reason=none)

---

16:04:58.965  gateway/reload    config change detected; evaluating reload (channels.telegram.streaming.mode)
16:04:59.353  gateway/reload    config change requires channel reload (telegram) — deferring until 2 operation(s), 1 reply(ies), 1 embedded run(s) complete
16:05:01.753  gateway-tool      gateway tool: restart requested (delayMs=default, reason=none)
16:05:05.353  gateway           SIGUSR1 received; restarting
16:05:05.353  gateway/shutdown  shutdown started: gateway restarting
16:05:05.754  gateway/reload    active operations and replies completed; reloading channels now
16:05:05.754  gateway/channels  restarting telegram channel              ← reload path
16:05:06.154  gateway/reload    config hot reload applied (channels.telegram.streaming.mode)
16:05:06.154  gateway/shutdown  shutdown completed cleanly in 869ms
16:05:06.154  gateway           restart mode: in-process restart (container: use in-process restart to keep PID 1 alive)
16:05:06.954  gateway           starting HTTP server...
16:05:06.954  channels/telegram [default] starting provider (@<bot>)     ← start #1
16:05:07.553  gateway           gateway ready
16:05:07.753  channels/telegram [default] starting provider (@<bot>)     ← start #2 (in same process)
16:05:08.354  channels/telegram webhook local listener on http://0.0.0.0:8787/telegram-webhook   ← #1 binds
16:05:08.554  channels/telegram webhook advertised to telegram on https://...
16:05:08.554  channels/telegram [default] channel exited: listen EADDRINUSE: address already in use 0.0.0.0:8787   ← #2 fails
16:05:08.554  channels/telegram [default] auto-restart attempt 1/10 in 5s
RAW_BUFFERClick to expand / collapse

Version

v2026.5.3 (also reproducible reasoning against v5.4–5.7 release notes — no fix shipped). Node process running the gateway, telegram channel.

Symptom

After a config change that requires a channel reload, the telegram channel enters a permanent auto-restart loop:

[default] starting provider (@<bot>)
[default] channel exited: listen EADDRINUSE: address already in use 0.0.0.0:8787
[default] auto-restart attempt N/10 in <backoff>s

The auto-restart never recovers because port 8787 is held by another [default] channel that won the bind in the same process. Once the auto-restart counter hits 10/10 the dead channel stops retrying, but the surviving one continues to handle traffic — so the bug is mostly invisible (telegram in/out works) until you notice the steady error stream in logs.

In our case it ran ~2 hours / 85 channel-exit events before we caught it. A docker restart of the slot resolves it cleanly.

Repro recipe

  1. Start a gateway with channels.telegram.enabled: true bound to webhookPort: 8787.

  2. Open a long-running operation in the agent (so a channel reload would be deferred — we hit it with 2 active operations + 1 pending reply + 1 embedded run).

  3. Mutate a channels.telegram.* field that requires channel reload (we hit it with channels.telegram.streaming.mode). The gateway logs:

    gateway/reload   config change detected; evaluating reload (channels.telegram.streaming.mode)
    gateway/reload   config change requires channel reload (telegram) — deferring until N operation(s), M reply(ies), K embedded run(s) complete
  4. Within the deferral window (before the active operations complete), have the agent invoke the gateway-tool restart action. The gateway logs:

    gateway-tool     gateway tool: restart requested (delayMs=default, reason=none)
  5. When the active operations finish, the deferred channel reload fires. The SIGUSR1 in-process restart also fires. Both code paths invoke channel start.

Observed log timeline

16:04:58.965  gateway/reload    config change detected; evaluating reload (channels.telegram.streaming.mode)
16:04:59.353  gateway/reload    config change requires channel reload (telegram) — deferring until 2 operation(s), 1 reply(ies), 1 embedded run(s) complete
16:05:01.753  gateway-tool      gateway tool: restart requested (delayMs=default, reason=none)
16:05:05.353  gateway           SIGUSR1 received; restarting
16:05:05.353  gateway/shutdown  shutdown started: gateway restarting
16:05:05.754  gateway/reload    active operations and replies completed; reloading channels now
16:05:05.754  gateway/channels  restarting telegram channel              ← reload path
16:05:06.154  gateway/reload    config hot reload applied (channels.telegram.streaming.mode)
16:05:06.154  gateway/shutdown  shutdown completed cleanly in 869ms
16:05:06.154  gateway           restart mode: in-process restart (container: use in-process restart to keep PID 1 alive)
16:05:06.954  gateway           starting HTTP server...
16:05:06.954  channels/telegram [default] starting provider (@<bot>)     ← start #1
16:05:07.553  gateway           gateway ready
16:05:07.753  channels/telegram [default] starting provider (@<bot>)     ← start #2 (in same process)
16:05:08.354  channels/telegram webhook local listener on http://0.0.0.0:8787/telegram-webhook   ← #1 binds
16:05:08.554  channels/telegram webhook advertised to telegram on https://...
16:05:08.554  channels/telegram [default] channel exited: listen EADDRINUSE: address already in use 0.0.0.0:8787   ← #2 fails
16:05:08.554  channels/telegram [default] auto-restart attempt 1/10 in 5s

Two distinct code paths attempt to start the same telegram channel inside the same process: (a) the deferred channel reload (gateway/channels/gateway/reload) and (b) the SIGUSR1 in-process restart's channel-start phase.

Suggested fix direction

Either:

  • the in-process restart path should cancel any pending deferred reloads (they are about to be subsumed by the full restart anyway), or
  • channel start should be single-flight per channel id within a process (idempotent — second concurrent start returns the existing instance / no-op).

The first feels more correct — a SIGUSR1 in-process restart is a strict superset of any deferred reload, so the deferred reload is redundant after SIGUSR1 is received.

Notes

  • v2026.5.7's "bound channel hot-reload deferrals so stale task records cannot block … reloads forever" bounds when the deferred reload fires but does not coordinate it with a concurrent SIGUSR1 restart, so this still reproduces on 5.7 from a reading of release notes.
  • Auto-restart on the losing channel runs for 10 attempts then stops, leaving the surviving channel still functional — so this manifests as persistent error noise more than user-visible breakage.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Channel-reload race: deferred reload + gateway-tool `restart` double-spawn telegram channel → EADDRINUSE crash loop [1 pull requests]