openclaw - ✅(Solved) Fix GatewayDrainingError should auto-retry, not surface to user [1 pull requests, 1 participants]

Q: Expected behavior

`GatewayDrainingError` should be treated like `isTransientHttp` errors — auto-retry after a short delay (e.g., wait for the restart to complete, then retry). The error should **never** surface to the user since it always resolves on its own.

openclaw2026-03-26 23:17:18

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#55412•Fetched 2026-04-08 01:39:47

View on GitHub

Comments

Participants

Timeline

Reactions

Author

assimetria-ai

Participants

assimetria-ai

Timeline (top)

cross-referenced ×1referenced ×1

Error Message

When the gateway restarts (e.g., after config.patch), any in-flight agent run that triggers a new command during the drain window gets GatewayDrainingError. This falls through to the generic error handler in agent-runner.runtime and surfaces to the user as: This is a transient error — the gateway comes back up seconds later. But the user sees an error and thinks something is broken. GatewayDrainingError should be treated like isTransientHttp errors — auto-retry after a short delay (e.g., wait for the restart to complete, then retry). The error should never surface to the user since it always resolves on its own. In agent-runner.runtime, the error handling chain checks for billing, context overflow, role ordering, session corruption, and transient HTTP — but GatewayDrainingError is not checked and falls to the generic Agent failed before reply message. Add a check before the generic error handler: if (message.includes('Gateway is draining') || error?.name === 'GatewayDrainingError') {

Fix Action

Fix / Workaround

Environment

OpenClaw 2026.3.24
macOS, local gateway, config.patch triggered restart
Happens every time a restart occurs while agents are active

PR fix notes

PR #55470: fix: auto-retry GatewayDrainingError instead of surfacing to user (#55412)

Repository: openclaw/openclaw
Author: factnest365-ops
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/55470

Description (problem / solution / changelog)

Summary

When the gateway restarts (e.g., after config.patch), in-flight agent runs that trigger new commands during the drain window get GatewayDrainingError. This falls through to the generic error handler and surfaces to users as:

⚠️ Agent failed before reply: Gateway is draining for restart; new tasks are not accepted.
Logs: openclaw logs --follow

This is a transient error — the gateway comes back up seconds later. But the user sees an error and thinks something is broken.

Fix

Adds a check for GatewayDrainingError before the generic error handler in the agent-runner error handling chain. When detected:

Wait 15 seconds for the gateway restart to complete
Retry the run (same pattern as existing transient HTTP error handling)

The check matches both the error message string (Gateway is draining) and the error class name (GatewayDrainingError), using a didRetryGatewayDrainingError flag to prevent infinite retry loops.

Why 15 seconds?

Gateway restarts typically complete within 5-10 seconds, but the delay accounts for slower systems. The existing transient HTTP retry uses 2.5 seconds since those are usually immediate provider hiccups. Gateway restarts need more time for the process to fully stop and restart.

Fixes #55412

Changed files

src/auto-reply/reply/agent-runner-execution.ts (modified, +19/-0)

Code Example

⚠️ Agent failed before reply: Gateway is draining for restart; new tasks are not accepted.
Logs: openclaw logs --follow

---

if (message.includes('Gateway is draining') || error?.name === 'GatewayDrainingError') {
  // Wait for restart to complete (poll gateway health or fixed delay)
  await new Promise(r => setTimeout(r, 15000));
  continue; // retry the run
}

RAW_BUFFERClick to expand / collapse

Problem

⚠️ Agent failed before reply: Gateway is draining for restart; new tasks are not accepted.
Logs: openclaw logs --follow

This is a transient error — the gateway comes back up seconds later. But the user sees an error and thinks something is broken.

Expected behavior

GatewayDrainingError should be treated like isTransientHttp errors — auto-retry after a short delay (e.g., wait for the restart to complete, then retry). The error should never surface to the user since it always resolves on its own.

Current behavior

In agent-runner.runtime, the error handling chain checks for billing, context overflow, role ordering, session corruption, and transient HTTP — but GatewayDrainingError is not checked and falls to the generic Agent failed before reply message.

Suggested fix

Add a check before the generic error handler:

if (message.includes('Gateway is draining') || error?.name === 'GatewayDrainingError') {
  // Wait for restart to complete (poll gateway health or fixed delay)
  await new Promise(r => setTimeout(r, 15000));
  continue; // retry the run
}

Environment

OpenClaw 2026.3.24
macOS, local gateway, config.patch triggered restart
Happens every time a restart occurs while agents are active

extent analysis

Fix Plan

To resolve the GatewayDrainingError issue, we need to modify the error handling chain in agent-runner.runtime to auto-retry after a short delay when this error occurs. Here are the steps:

Modify the error handling chain to check for GatewayDrainingError:

if (message.includes('Gateway is draining') || error?.name === 'GatewayDrainingError') {
  // Wait for restart to complete (poll gateway health or fixed delay)
  await new Promise(r => setTimeout(r, 15000)); // 15-second delay
  continue; // retry the run
}

Alternatively, poll the gateway health instead of using a fixed delay:

if (message.includes('Gateway is draining') || error?.name === 'GatewayDrainingError') {
  while (true) {
    const gatewayHealth = await getGatewayHealth(); // implement getGatewayHealth function
    if (gatewayHealth === 'healthy') {
      break;
    }
    await new Promise(r => setTimeout(r, 1000)); // 1-second poll interval
  }
  continue; // retry the run
}

Verification

To verify that the fix worked, restart the gateway while an agent run is in progress and check that the GatewayDrainingError does not surface to the user. The agent run should auto-retry after the gateway restart is complete.

Extra Tips

Make sure to implement the getGatewayHealth function to poll the gateway health.
Adjust the delay or poll interval as needed to ensure that the agent run retries after the gateway restart is complete.
Consider adding logging to track the number of retries and the time it takes for the gateway to become healthy again.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

#generation error #database connection #vector store #embedding generation #cache error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - ✅(Solved) Fix GatewayDrainingError should auto-retry, not surface to user [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fix / Workaround

Environment

PR fix notes

PR #55470: fix: auto-retry GatewayDrainingError instead of surfacing to user (#55412)

Description (problem / solution / changelog)

Summary

Fix

Why 15 seconds?

Changed files

Code Example

Problem

Expected behavior

Current behavior

Suggested fix

Environment

extent analysis

Fix Plan

Verification

Extra Tips

FAQ

Expected behavior

Still need to ship something?

TRENDING

openclaw - ✅(Solved) Fix GatewayDrainingError should auto-retry, not surface to user [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fix / Workaround

Environment

PR fix notes

PR #55470: fix: auto-retry GatewayDrainingError instead of surfacing to user (#55412)

Description (problem / solution / changelog)

Summary

Fix

Why 15 seconds?

Changed files

Code Example

Problem

Expected behavior

Current behavior

Suggested fix

Environment

extent analysis

Fix Plan

Verification

Extra Tips

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING