openclaw - 💡(How to fix) Fix Gateway frequent restarts: config reload too aggressive + auth pre-warm blocks event loop

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Fix Action

Fix / Workaround

Workaround(用户侧已实施)

Code Example

provider auth state pre-warmed in 234463ms eventLoopMax=30484.2ms
provider auth state pre-warmed in 162304ms eventLoopMax=29846.7ms
provider auth state pre-warmed in 144663ms eventLoopMax=28303.2ms

---

# config change → restart 示例
[reload] config change detected; evaluating reload (channels.telegram.botToken)
[gateway] signal SIGTERM received
[gateway] received SIGTERM; restarting

# auth pre-warm 阻塞示例
[gateway] provider auth state pre-warmed in 234463ms eventLoopMax=30484.2ms

# crash loop 示例(10:00-11:37,2-3 分钟一次)
[2026-05-27T10:00:56Z] openclaw restart attempt
[2026-05-27T10:01:16Z] openclaw restart done
[2026-05-27T10:05:35Z] openclaw restart attempt
[2026-05-27T10:05:56Z] openclaw restart done
... (15 times in ~2 hours)

# secrets.resolve 失败(高频)
[ws] ⇄ res ✗ secrets.resolve 10ms errorCode=UNAVAILABLE errorMessage=secrets.resolve failed
RAW_BUFFERClick to expand / collapse

OpenClaw Gateway 频繁重启 — config reload 过于激进 + auth pre-warm 阻塞事件循环

问题描述

Gateway 在正常使用中频繁触发不必要的全进程重启,导致:

  • 活跃的 webchat session 被中断,反复出现 [System] Your previous turn was interrupted by a gateway restart 提示
  • launchd KeepAlive 配合短 ThrottleInterval 形成 crash loop
  • 用户体验严重受损

环境

  • OpenClaw: v2026.5.22
  • macOS: 26.4.1 (arm64)
  • Node: 25.9.0
  • 部署方式: LaunchAgent (launchd)

根因分析

我们在单日(24h)内观测到以下数据:

  • 102 次 SIGTERM(全进程重启)
  • 33 次 config change 触发重启
  • 184 次 secrets.resolve failed
  • 21 次 launchd kickstart(含 crash loop,每 2-3 分钟一次,持续 ~2 小时)

根因 1: config change 热重载过于激进

任何 config 字段变更(包括值实际未变)都触发全进程 SIGTERM restart。

触发频率最高的字段:

字段触发次数是否需要全进程重启
channels.telegram.botToken5不需要(channel 级重载即可)
channels.telegram.proxy6不需要
channels.telegram.accounts.*8不需要
agents.defaults.memorySearch.remote.*5不需要
models.providers.*3可能需要

问题meta.lastTouchedAt 更新或 UI 保存操作即使未改变实际值,也会触发 restart。

根因 2: provider auth pre-warm 阻塞事件循环

多次观测到 provider auth state pre-warmed 耗时 50-234 秒,eventLoopMax 高达 20-30 秒。阻塞期间 gateway 无法响应 health check,可能被误判为不健康。

provider auth state pre-warmed in 234463ms eventLoopMax=30484.2ms
provider auth state pre-warmed in 162304ms eventLoopMax=29846.7ms
provider auth state pre-warmed in 144663ms eventLoopMax=28303.2ms

根因 3: secrets.resolve 失败但无明确诊断

secrets.resolve failed 高频出现(184 次/天),错误信息仅为 UNAVAILABLE,无进一步诊断信息(是 keychain 条目不存在?account 不匹配?还是 keychain 服务不可达?)。

建议改进

建议 1: config change 分级重启(高优先级)

将 config 字段分为 hot/cold 两级:

Hot(channel 级重载,不重启进程)

  • channels.*.proxy, channels.*.botToken, channels.*.accounts
  • bindings, agents.list
  • meta.lastTouchedAt

Cold(全进程重启)

  • gateway.port, gateway.bind, gateway.auth
  • plugins.*(enabled 状态变更)
  • models.providers.*(新增/删除 provider)

建议 2: botToken/proxy diff 检查(中优先级)

config 写入时做值级 diff(semantic diff,非仅 string compare)。值实际未变时跳过 restart。特别防止 UI 保存触发 meta.lastTouchedAt 导致的无效 restart。

建议 3: auth pre-warm 异步化(高优先级)

将 provider auth pre-warm 改为异步操作,设超时上限(建议 30s),不阻塞主事件循环。pre-warm 超时时标记 provider 为 degraded 而非触发 restart。

建议 4: secrets.resolve 错误信息增强(低优先级)

secrets.resolve failed 应包含具体原因:

  • keychain 条目不存在(service + account 是什么?)
  • keychain 服务不可达
  • resolve 返回空值
  • timeout

Workaround(用户侧已实施)

  1. LaunchAgent ThrottleInterval 从 10s 改为 60s,缓解 crash loop
  2. 确保 keychain 条目 account 名与 config id 匹配
  3. webchat 中新建对话跳过中断标记

相关日志

# config change → restart 示例
[reload] config change detected; evaluating reload (channels.telegram.botToken)
[gateway] signal SIGTERM received
[gateway] received SIGTERM; restarting

# auth pre-warm 阻塞示例
[gateway] provider auth state pre-warmed in 234463ms eventLoopMax=30484.2ms

# crash loop 示例(10:00-11:37, 每 2-3 分钟一次)
[2026-05-27T10:00:56Z] openclaw restart attempt
[2026-05-27T10:01:16Z] openclaw restart done
[2026-05-27T10:05:35Z] openclaw restart attempt
[2026-05-27T10:05:56Z] openclaw restart done
... (15 times in ~2 hours)

# secrets.resolve 失败(高频)
[ws] ⇄ res ✗ secrets.resolve 10ms errorCode=UNAVAILABLE errorMessage=secrets.resolve failed

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Gateway frequent restarts: config reload too aggressive + auth pre-warm blocks event loop