openclaw - ✅(Solved) Fix Windows: gateway restart does not wait for active tasks and loses session state [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#56284Fetched 2026-04-08 01:42:47
View on GitHub
Comments
0
Participants
1
Timeline
5
Reactions
0
Participants
Timeline (top)
referenced ×4cross-referenced ×1

Error Message

throw new Error(Gateway restart failed: ${health.message});

Root Cause

Currently must use:

openclaw gateway stop
# Wait manually
openclaw gateway start

But even this does not trigger session recovery because sentinel is not written.

Fix Action

Workaround

Currently must use:

openclaw gateway stop
# Wait manually
openclaw gateway start

But even this does not trigger session recovery because sentinel is not written.

PR fix notes

PR #56292: feat(gateway): implement graceful restart on Windows

Description (problem / solution / changelog)

Fix: Windows gateway restart now waits for tasks and recovers sessions

Summary

This PR implements graceful restart for Windows Gateway service. Previously, openclaw gateway restart on Windows would immediately return and forcefully terminate all active tasks, losing session state. This fix leverages existing OpenClaw infrastructure (deferGatewayRestartUntilIdle, writeRestartSentinel, etc.) to provide proper graceful restart behavior.

Problem

On Windows (Scheduled Task installation):

  • openclaw gateway restart returned immediately without waiting for tasks
  • Active tasks were killed abruptly, no chance to complete
  • Sessions were lost after restart (no recovery)
  • Users had to manually stop + start, but even that didn't recover sessions

Solution

Modified runDaemonRestart() in daemon-cli.ts to implement graceful restart for Windows:

  1. Wait for idle: deferGatewayRestartUntilIdle({ timeoutMs: 300000 })
  2. Write sentinel: writeRestartSentinel({ reason: 'manual restart' })
  3. Graceful stop: runServiceStop({ graceful: true })
  4. Start: runServiceStart()
  5. Health check: waitForGatewayHealthyRestart()

Unix/Linux/macOS continue to use SIGUSR1-based restart (unchanged).

Changes

  • packages/gateway/src/cli/daemon-cli.ts
    • Added Windows-specific graceful restart path before runServiceRestart()
    • Falls back to standard restart if graceful path fails
    • Uses existing infrastructure (no new dependencies)

Testing

Manual Test (Windows)

  1. Start a long-running task: openclaw agent --message "Long running task..."
  2. Execute: openclaw gateway restart
  3. Observe:
    • Command waits for task to complete (or timeout after 5 min)
    • After restart, task state is preserved and continues
    • Results are delivered normally to configured channels

Expected Behavior

  • gateway restart now takes longer (waits for tasks)
  • ✅ Active tasks complete before restart
  • ✅ Sessions are recovered after restart
  • ✅ Same behavior as Unix SIGUSR1 restart

Related Issue

Closes #56284

Additional Notes

  • This fix does not affect Unix platforms (SIGUSR1 path unchanged)
  • Graceful restart is backward compatible (no config changes needed)
  • Timeout is 5 minutes; can be adjusted if needed
  • All functions used (deferGatewayRestartUntilIdle, writeRestartSentinel, etc.) already existed in codebase

screenshots

N/A (CLI tool)

Changed files

  • src/cli/daemon-cli/lifecycle.ts (modified, +177/-2)

Code Example

openclaw gateway stop
# Wait manually
openclaw gateway start

---

async function runDaemonRestart(opts = {}) {
  // ... existing code ...

  if (process.platform === 'win32') {
    // Windows graceful restart
    const gatewayPort = await resolveGatewayLifecyclePort(service);

    // 1. Wait for active tasks to complete (or timeout)
    await deferGatewayRestartUntilIdle({
      timeoutMs: 300000 // 5 minutes
    });

    // 2. Write restart sentinel for session recovery
    await writeRestartSentinel({
      reason: 'manual restart',
      timestamp: new Date().toISOString()
    });

    // 3. Graceful stop
    await runServiceStop({ graceful: true });

    // 4. Start new instance
    await runServiceStart();

    // 5. Wait for health check
    const health = await waitForGatewayHealthyRestart({
      port: gatewayPort,
      attempts: 20,
      delayMs: 500
    });

    if (!health.success) {
      throw new Error(`Gateway restart failed: ${health.message}`);
    }

    return { outcome: 'restarted', message: 'Gateway restarted gracefully with session recovery' };
  } else {
    // Unix: send SIGUSR1 (existing behavior)
    return await scheduleGatewaySigusr1Restart({ reason: '/restart' });
  }
}
RAW_BUFFERClick to expand / collapse

GitHub Issue: Windows gateway restart does not wait for active tasks

Problem

On Windows (using Scheduled Task), openclaw gateway restart immediately returns success but does not actually restart the Gateway properly. More critically, it forcefully terminates any running tasks without waiting for them to complete, and does not recover sessions after restart.

Current Behavior

  1. Run openclaw gateway restart
  2. Command returns immediately with "success"
  3. Gateway process may still be running or restarting
  4. Any active tasks are killed abruptly
  5. Sessions are lost, no recovery after restart
  6. User must manually stop + start to get a clean restart

Expected Behavior

  1. gateway restart should use the existing graceful restart mechanism (deferGatewayRestartUntilIdle)
  2. Wait for active tasks to complete (or timeout)
  3. Write restart sentinel for session recovery
  4. Only return after Gateway is healthy and sessions are restored

Technical Details

  • OpenClaw version: 2026.3.24
  • Platform: Windows 11 (win32)
  • Installation: Windows Scheduled Task
  • Gateway functions exist but not wired:
    • deferGatewayRestartUntilIdle()
    • writeRestartSentinel()
    • consumeRestartSentinel()
    • scheduleGatewaySigusr1Restart() (Unix only)

Key finding: These functions are implemented in pi-embedded-BaSvmUpW.js but gateway restart on Windows does not call them.

Reproduction Steps

  1. Start a long-running task (e.g., a cron job or interactive session)
  2. Execute openclaw gateway restart
  3. Observe: command returns immediately
  4. Check task status: it is aborted, not completed
  5. Check session store: no recovery, task lost

Workaround

Currently must use:

openclaw gateway stop
# Wait manually
openclaw gateway start

But even this does not trigger session recovery because sentinel is not written.

Proposed Fix

Modify runDaemonRestart() in daemon-cli.js (or lifecycle-core-gBCZgGHS.js) to implement graceful restart for Windows:

async function runDaemonRestart(opts = {}) {
  // ... existing code ...

  if (process.platform === 'win32') {
    // Windows graceful restart
    const gatewayPort = await resolveGatewayLifecyclePort(service);

    // 1. Wait for active tasks to complete (or timeout)
    await deferGatewayRestartUntilIdle({
      timeoutMs: 300000 // 5 minutes
    });

    // 2. Write restart sentinel for session recovery
    await writeRestartSentinel({
      reason: 'manual restart',
      timestamp: new Date().toISOString()
    });

    // 3. Graceful stop
    await runServiceStop({ graceful: true });

    // 4. Start new instance
    await runServiceStart();

    // 5. Wait for health check
    const health = await waitForGatewayHealthyRestart({
      port: gatewayPort,
      attempts: 20,
      delayMs: 500
    });

    if (!health.success) {
      throw new Error(`Gateway restart failed: ${health.message}`);
    }

    return { outcome: 'restarted', message: 'Gateway restarted gracefully with session recovery' };
  } else {
    // Unix: send SIGUSR1 (existing behavior)
    return await scheduleGatewaySigusr1Restart({ reason: '/restart' });
  }
}

Related Files

  • daemon-cli-BgoyP3Ke.js - Gateway CLI commands
  • lifecycle-core-gBCZgGHS.js - Service lifecycle (restart, stop, start)
  • pi-embedded-BaSvmUpW.js - Contains the graceful restart functions
  • status-D8mZfs6u.js - Health check utilities (waitForGatewayHealthyRestart)

Additional Context

This issue is critical for users who run automated tasks via cron or have long-running agent sessions. Forcing abrupt termination leads to:

  • Lost task results
  • Incomplete workflows
  • Poor user experience

OpenClaw already has all the necessary infrastructure for graceful restart and session recovery; it just needs to be wired up for Windows service manager.


Documentation: See internal analysis at .proactivity/gateway-graceful-restart-best-practices.md

Tested on: OpenClaw 2026.3.24, Windows 11, Scheduled Task installation


Acceptable Solutions

  • Implement graceful restart for Windows as described above
  • Alternatively, expose deferGatewayRestartUntilIdle to CLI as --graceful flag
  • Ensure sentinel is written for all restart scenarios
  • Add tests for Windows restart behavior

Priority: High (affects core reliability)

Difficulty: Medium (functions already exist, just need to connect them)

Contributor: @YOUR_GITHUB_USERNAME (contact via OpenClaw workspace)

extent analysis

Fix Plan

To implement a graceful restart for Windows, follow these steps:

  1. Modify runDaemonRestart(): Update the runDaemonRestart() function in daemon-cli.js (or lifecycle-core-gBCZgGHS.js) to include the Windows-specific restart logic.
  2. Implement Windows restart logic: Add the following code to handle the restart:

if (process.platform === 'win32') { const gatewayPort = await resolveGatewayLifecyclePort(service);

// Wait for active tasks to complete (or timeout) await deferGatewayRestartUntilIdle({ timeoutMs: 300000 // 5 minutes });

// Write restart sentinel for session recovery await writeRestartSentinel({ reason: 'manual restart', timestamp: new Date().toISOString() });

// Graceful stop await runServiceStop({ graceful: true });

// Start new instance await runServiceStart();

// Wait for health check const health = await waitForGatewayHealthyRestart({ port: gatewayPort, attempts: 20, delayMs: 500 });

if (!health.success) { throw new Error(Gateway restart failed: ${health.message}); }

return { outcome: 'restarted', message: 'Gateway restarted gracefully with session recovery' }; }

3. **Expose `deferGatewayRestartUntilIdle` to CLI (optional)**: Consider adding a `--graceful` flag to the CLI to allow users to trigger a graceful restart manually.
4. **Ensure sentinel is written for all restart scenarios**: Verify that the restart sentinel is written for all restart scenarios, including manual and automated restarts.

### Verification
To verify the fix, follow these steps:

1. **Start a long-running task**: Begin a long-running task, such as a cron job or interactive session.
2. **Execute `openclaw gateway restart`**: Run the `openclaw gateway restart` command.
3. **Check task status**: Verify that the task completes successfully and is not aborted.
4. **Check session store**: Confirm that the session is recovered after the restart.

### Extra Tips
* Ensure that the `deferGatewayRestartUntilIdle`, `writeRestartSentinel`, and `consumeRestartSentinel` functions are properly implemented and connected.
* Add tests for Windows restart behavior to prevent regressions.
* Consider adding logging and monitoring to track restart events and session recovery.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix Windows: gateway restart does not wait for active tasks and loses session state [1 pull requests, 1 participants]