openclaw - ✅(Solved) Fix [BUG] macOS LaunchAgent + configure wizard creates duplicate gateway process, causing 30+ hour Telegram 409 polling conflict [2 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#43628Fetched 2026-04-08 00:16:40
View on GitHub
Comments
1
Participants
2
Timeline
8
Reactions
0
Timeline (top)
cross-referenced ×4labeled ×2commented ×1referenced ×1

Running openclaw configure (wizard) while the LaunchAgent is already managing a running gateway process leaves the original process alive. Both instances then attempt to long-poll Telegram simultaneously, producing continuous 409 Conflict errors that silently drop or misroute incoming messages for as long as both processes are alive.

Error Message

  • Error: getUpdates conflict: 409: Conflict: terminated by other getUpdates request; make sure that only one bot instance is running
  • No warning or error was surfaced to the user 2026-03-10T20:09:22-04:00 First 409 error logged (~22 min after wizard) High / P1 — All incoming Telegram messages silently dropped or misrouted for 31+ hours with no user-visible error. 5 bot accounts affected. User had no idea the system was broken until manually noticing missed messages.

Root Cause

Running openclaw configure (wizard) while the LaunchAgent is already managing a running gateway process leaves the original process alive. Both instances then attempt to long-poll Telegram simultaneously, producing continuous 409 Conflict errors that silently drop or misroute incoming messages for as long as both processes are alive.

Fix Action

Fix / Workaround

Workaround: openclaw gateway stop && sleep 3 && openclaw gateway start

PR fix notes

PR #43639: Gateway: prevent detached respawn when launchd already owns the process

Description (problem / solution / changelog)

Summary

Describe the problem and fix in 2–5 bullets:

  • Problem: SIGUSR1 restart could fall back to detached respawn on macOS when launchd env hints were missing, even if the process was actually launchd-managed.
  • Why it matters: detached fallback can leave an unmanaged old gateway process alive while launchd starts another one, causing duplicate Telegram long-polling and sustained 409 conflicts.
  • What changed: launchd supervision detection now has a runtime fallback that validates the active launchd job PID (launchctl print gui/<uid>/<label>) against process.pid before deciding to detached-respawn.
  • What did NOT change (scope boundary): no changes to launchd install/restart commands, channel polling logic, or gateway stop/start CLI semantics.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #43628
  • Related #40932
  • Related #41829

User-visible / Behavior Changes

  • macOS gateway restarts now correctly treat launchd-managed processes as supervised even when launchd hint env vars are absent, avoiding detached respawn fallback that can create duplicate gateway pollers.

Security Impact (required)

  • New permissions/capabilities? (No)
  • Secrets/tokens handling changed? (No)
  • New/changed network calls? (No)
  • Command/tool execution surface changed? (No)
  • Data access scope changed? (No)
  • If any Yes, explain risk + mitigation:

Repro + Verification

Environment

  • OS: macOS (launchd path verified via unit test mocking)
  • Runtime/container: Node/Vitest
  • Model/provider: N/A
  • Integration/channel (if any): Telegram conflict scenario covered by restart-path fix
  • Relevant config (redacted): LaunchAgent label ai.openclaw.gateway

Steps

  1. Start from a macOS gateway restart path where launchd env hints are missing.
  2. Trigger restartGatewayProcessWithFreshPid().
  3. Observe supervision mode resolution.

Expected

  • Restart logic should classify the process as supervised when the loaded launchd runtime PID matches the current process.

Actual

  • Verified by unit test: fallback launchctl print PID match returns supervised; non-match still uses detached spawn fallback.

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Validation commands and outcomes:

  • pnpm test src/infra/process-respawn.test.ts (passed: 1 file, 15 tests)

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios:
    • Added/ran test for launchd runtime PID fallback when env hints are absent.
    • Added/ran test confirming detached spawn still occurs when runtime PID does not match current process.
  • Edge cases checked:
    • Existing launchd env-hint behavior still returns supervised.
    • Non-darwin behavior unaffected in existing test suite.
  • What you did not verify:
    • Live macOS LaunchAgent restart on a real host (this PR includes unit-test validation only).

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.

Compatibility / Migration

  • Backward compatible? (Yes)
  • Config/env changes? (No)
  • Migration needed? (No)
  • If yes, exact upgrade steps:

Failure Recovery (if this breaks)

  • How to disable/revert this change quickly: revert commit 5098effc0.
  • Files/config to restore: src/infra/supervisor-markers.ts, src/infra/process-respawn.test.ts, CHANGELOG.md.
  • Known bad symptoms reviewers should watch for: gateway restart on macOS falling back to detached spawn while launchd-managed.

Risks and Mitigations

List only real risks for this PR. Add/remove entries as needed. If none, write None.

  • Risk: launchctl print output format differences could prevent PID detection on some macOS versions.
    • Mitigation: detection is best-effort fallback only; existing env-hint detection is unchanged; tests cover both match and mismatch behavior.

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • src/infra/process-respawn.test.ts (modified, +63/-0)
  • src/infra/supervisor-markers.ts (modified, +44/-1)

PR #56324: fix(telegram): add per-token duplicate poller guard to prevent 409 conflicts

Description (problem / solution / changelog)

Summary

  • Add a per-token active polling session registry in monitorTelegramProvider() that detects and waits for an existing session to release before starting a new one
  • Add a 500ms drain pause in the hot-reload channel restart handler between stopChannel and startChannel

Both changes prevent 409 Conflict errors from concurrent getUpdates calls on the same bot token.

Context

The gateway has no protection against duplicate polling sessions for the same bot token. Multiple scenarios can create overlapping pollers:

  1. Hot-reload race: applyHotReload restarts channels via stopChannel then startChannel, but waitForGracefulStop has a 15-second timeout (POLL_STOP_GRACE_MS). If the grammY runner does not stop within that window, the new poller starts while the old one still holds a connection.

  2. External scripts: Any process calling getUpdates on the same token (launchd agents, cron scripts, monitoring tools) creates a competing poller the gateway cannot detect.

  3. Watchdog restart overlap: The 90-second POLL_STALL_THRESHOLD_MS triggers a polling cycle restart that can overlap with the existing session if graceful stop times out.

PR #20930 fixed the SIGUSR1 + config.patch race, but the file-watcher hot-reload path remains unguarded.

Implementation

extensions/telegram/src/monitor.ts (+68 lines) — Module-level Map<string, ActivePollerEntry> keyed by bot token. Before starting polling, monitorTelegramProvider checks the registry and waits up to 5 seconds for any existing session to signal completion via a done promise. The registry is cleaned up in the finally block.

src/gateway/server-reload-handlers.ts (+4 lines) — 500ms setTimeout between stopChannel and startChannel in the hot-reload channel restart path, giving the polling session graceful stop a buffer to fully release.

Test plan

  • Existing telegram monitor tests pass (23/23)
  • Existing reload handler tests pass (12/12)
  • Verified on a 4-bot macOS setup (jarvis, atlas, forge, trader) — zero 409 errors after 10+ minutes of clean operation
  • Manual test: edit config while gateway is running, verify hot-reload restarts channels without 409s

Fixes #56230 Related: #20893, #43628, #50064, #49822, #33154

Changed files

  • extensions/telegram/src/monitor.ts (modified, +69/-0)
  • src/agents/pi-tools.params.ts (modified, +14/-4)
  • src/gateway/server-reload-handlers.ts (modified, +6/-0)

Code Example

From gateway.err.log:


Total 409 errors: 10,996
First: 2026-03-10T16:09:22-04:00
Last:  2026-03-11T22:48:19-04:00

By day:
  2026-03-10: 2,280 errors
  2026-03-11: 8,716 errors

Sample (representative):
2026-03-11T20:00:09-04:00 [telegram] getUpdates conflict: Call to 'getUpdates' failed! (409: Conflict: terminated by other getUpdates request; make sure that only one bot instance is running); retrying in 30s.


From gateway.log (trigger event):

2026-03-10T19:47:12Z  openclaw configure wizard last ran (from config meta)
2026-03-10T20:09:22-04:00  First 409 error logged (~22 min after wizard)


Only one process running at time of filing: PID 39074, started ~23:29 EDT March 11 (after old process finally died).
RAW_BUFFERClick to expand / collapse

Bug type

Regression (worked before, now fails)

Summary

Running openclaw configure (wizard) while the LaunchAgent is already managing a running gateway process leaves the original process alive. Both instances then attempt to long-poll Telegram simultaneously, producing continuous 409 Conflict errors that silently drop or misroute incoming messages for as long as both processes are alive.

Steps to reproduce

  1. Have OpenClaw running normally as a LaunchAgent (standard macOS setup)
  2. Run openclaw configure (the setup wizard) while the gateway is live
  3. Wizard completes and triggers a gateway restart
  4. New gateway process starts, but the LaunchAgent-managed old process is NOT killed first
  5. Both processes begin polling Telegram simultaneously with the same bot tokens
  6. Result: continuous 409 Conflict errors begin ~20 minutes after wizard completes

Expected behavior

When the wizard or any restart flow triggers a new gateway process, the existing LaunchAgent-managed process should be fully stopped (SIGTERM + wait for confirmation) before the new instance begins Telegram polling. The openclaw gateway stop command should be called and awaited before spawning the new process.

Actual behavior

  • gateway.err.log logged 10,996 getUpdates 409 Conflict errors over ~31 continuous hours
  • ~400 errors/hour, every hour, with no gaps
  • Error: getUpdates conflict: 409: Conflict: terminated by other getUpdates request; make sure that only one bot instance is running
  • Timeline: started March 10 ~4:09 PM EDT, resolved March 11 ~10:48 PM EDT (old process finally died on its own)
  • Trigger: openclaw configure ran at March 10 3:47 PM EDT (conflicts began ~22 minutes later)
  • All incoming Telegram messages had a ~50% chance of being silently dropped for 31 hours
  • No warning or error was surfaced to the user

OpenClaw version

2026.3.8

Operating system

macOS 15.6 (arm64, Mac Mini)

Install method

npm global (pnpm, /opt/homebrew/lib/node_modules/openclaw), LaunchAgent managed

Model

anthropic/claude-sonnet-4-6

Provider / routing chain

Anthropic

Config file / key location

No response

Additional provider/model setup details

No response

Logs, screenshots, and evidence

From gateway.err.log:


Total 409 errors: 10,996
First: 2026-03-10T16:09:22-04:00
Last:  2026-03-11T22:48:19-04:00

By day:
  2026-03-10: 2,280 errors
  2026-03-11: 8,716 errors

Sample (representative):
2026-03-11T20:00:09-04:00 [telegram] getUpdates conflict: Call to 'getUpdates' failed! (409: Conflict: terminated by other getUpdates request; make sure that only one bot instance is running); retrying in 30s.


From gateway.log (trigger event):

2026-03-10T19:47:12Z  openclaw configure wizard last ran (from config meta)
2026-03-10T20:09:22-04:00  First 409 error logged (~22 min after wizard)


Only one process running at time of filing: PID 39074, started ~23:29 EDT March 11 (after old process finally died).

Impact and severity

High / P1 — All incoming Telegram messages silently dropped or misrouted for 31+ hours with no user-visible error. 5 bot accounts affected. User had no idea the system was broken until manually noticing missed messages.

Workaround: openclaw gateway stop && sleep 3 && openclaw gateway start

Related open issues: #40932, #41829

Additional information

No response

extent analysis

Fix Plan

To resolve the issue, we need to ensure that the existing LaunchAgent-managed process is fully stopped before spawning a new process. We can achieve this by calling openclaw gateway stop and awaiting its completion before starting the new process.

Code Changes

We need to modify the openclaw configure wizard to stop the existing gateway process before starting a new one. Here's an example code snippet:

const { spawnSync } = require('child_process');

// ...

// Stop the existing gateway process
spawnSync('openclaw', ['gateway', 'stop'], { stdio: 'inherit' });

// Wait for the process to exit
setTimeout(() => {
  // Start the new gateway process
  spawnSync('openclaw', ['gateway', 'start'], { stdio: 'inherit' });
}, 3000); // Wait for 3 seconds to ensure the process has exited

Alternatively, you can use a more robust approach using child_process.execSync with a timeout:

const { execSync } = require('child_process');

// ...

// Stop the existing gateway process
execSync('openclaw gateway stop', { stdio: 'inherit', timeout: 30000 });

Configuration Changes

No configuration changes are required for this fix.

Verification

To verify that the fix worked, you can:

  1. Run openclaw configure while the gateway is live.
  2. Check the gateway.err.log file for any 409 Conflict errors.
  3. Verify that only one process is running at a time using ps aux | grep openclaw.

If the fix is successful, you should not see any 409 Conflict errors, and only one process should be running at a time.

Extra Tips

To prevent similar issues in the future, consider implementing a more robust process management system, such as using a process manager like pm2 or systemd. Additionally, you can add logging and monitoring to detect and alert on similar issues.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

When the wizard or any restart flow triggers a new gateway process, the existing LaunchAgent-managed process should be fully stopped (SIGTERM + wait for confirmation) before the new instance begins Telegram polling. The openclaw gateway stop command should be called and awaited before spawning the new process.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix [BUG] macOS LaunchAgent + configure wizard creates duplicate gateway process, causing 30+ hour Telegram 409 polling conflict [2 pull requests, 1 comments, 2 participants]