openclaw - ✅(Solved) Fix [Bug]: restart storm from telegram.retry.jitter type mismatch + misleading doctor SecretRef for Telegram token [1 pull requests, 3 comments, 4 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#52130Fetched 2026-04-08 01:15:15
View on GitHub
Comments
3
Participants
4
Timeline
10
Reactions
0
Timeline (top)
referenced ×4commented ×3cross-referenced ×3

After host reboot, OpenClaw appeared "hung". Investigation showed a restart storm in a secondary runtime, plus confusing SecretRef diagnostics.

Observed behavior:

  • openclaw doctor reports:
    • channels.telegram.botToken: unresolved SecretRef "file:filemain:/providers/channels/telegram/botToken"
  • At the same time, runtime Telegram channel is actually healthy in openclaw status (Telegram: ON/OK) and provider starts successfully.
  • A user-level gateway entered a fast restart loop due to invalid config type:
    • channels.telegram.retry.jitter: Invalid input: expected number, received boolean
  • In parallel, openclaw-chromium-cdp.service was in an aggressive restart loop due to profile lock permission errors.

Root Cause

After host reboot, OpenClaw appeared "hung". Investigation showed a restart storm in a secondary runtime, plus confusing SecretRef diagnostics.

Observed behavior:

  • openclaw doctor reports:
    • channels.telegram.botToken: unresolved SecretRef "file:filemain:/providers/channels/telegram/botToken"
  • At the same time, runtime Telegram channel is actually healthy in openclaw status (Telegram: ON/OK) and provider starts successfully.
  • A user-level gateway entered a fast restart loop due to invalid config type:
    • channels.telegram.retry.jitter: Invalid input: expected number, received boolean
  • In parallel, openclaw-chromium-cdp.service was in an aggressive restart loop due to profile lock permission errors.

Fix Action

Fix / Workaround

Workaround used

  • Fix invalid config value (jitter: true -> numeric, e.g. 0.2).
  • Disable problematic user-level services in secondary runtime (skbot) to stop loops.
  • Keep a single active runtime/user for stability.

PR fix notes

PR #52137: fix(config): coerce boolean retry.jitter to number

Description (problem / solution / changelog)

Summary

Problem: Setting channels.telegram.retry.jitter to true (boolean) instead of a number causes Invalid input: expected number, received boolean validation error on every gateway startup, triggering a restart storm. The user's gateway becomes permanently stuck in a crash loop.

Why it matters: The jitter field semantically supports enable/disable (true/false) but the schema only accepts numbers. Users who set jitter: true (reasonable intent: "enable jitter") get an unrecoverable crash loop that requires manual config editing to fix.

What changed:

  • Added z.preprocess() to RetryConfigSchema.jitter in zod-schema.core.ts to coerce true0.1 (default jitter factor) and false0 (no jitter)
  • Applied the same coercion to web.reconnect.jitter in zod-schema.ts
  • Added regression test in config.schema-regressions.test.ts verifying both true and false are accepted

What did NOT change:

  • RetryConfigSchema shape is unchanged — numeric jitter values still validated with min(0).max(1) as before
  • Runtime jitter calculation in src/infra/retry.ts is untouched — applyJitter() and resolveRetryConfig() already handle 0 correctly
  • No migration needed — z.preprocess runs at parse time before validation, handling both existing and new configs
  • Discord retry config automatically covered since it shares RetryConfigSchema
  • Default jitter values (TELEGRAM_RETRY_DEFAULTS.jitter = 0.1) are untouched

Change Type

  • Bug fix (non-breaking)

Scope

Gateway (config schema validation)

Linked Issue

Closes #52130

Security Impact

  • New permissions requested: none
  • Secrets handling changes: none
  • New network calls: none
  • New command/tool execution: none
  • Data access changes: none

Human Verification

I personally verified:

  • pnpm tsgo passes
  • pnpm test -- src/config/config.schema-regressions.test.ts — 17/17 tests pass (16 existing + 1 new)
  • pnpm config:docs:check — no baseline drift (preprocess doesn't change schema surface)
  • validateConfigObject({ channels: { telegram: { retry: { jitter: true } } } }) returns { ok: true }
  • validateConfigObject({ channels: { telegram: { retry: { jitter: false } } } }) returns { ok: true }
  • Numeric jitter values (0, 0.5, 1) still accepted; values outside 0-1 still rejected

Evidence

Test Files  1 passed (1)
     Tests  17 passed (17)
  Duration  751ms

What I Did NOT Verify

  • Not verified: end-to-end gateway restart with a real jitter: true config (tested validation layer only)
  • Not verified: doctor output — the issue mentions misleading doctor messaging, but that's a separate concern from the crash-loop fix

Failure Recovery

If this breaks in production:

  • Detection: Config validation errors for retry.jitter would reappear
  • Rollback: Remove the z.preprocess wrapper, reverting to bare z.number().min(0).max(1)
  • Blast radius: Only affects users with boolean jitter values — zero impact on users with numeric values or no jitter config

Generated with Claude Code

Changed files

  • src/config/config.schema-regressions.test.ts (modified, +25/-0)
  • src/config/zod-schema.core.ts (modified, +3/-1)
  • src/config/zod-schema.ts (modified, +6/-1)
RAW_BUFFERClick to expand / collapse

Summary

After host reboot, OpenClaw appeared "hung". Investigation showed a restart storm in a secondary runtime, plus confusing SecretRef diagnostics.

Observed behavior:

  • openclaw doctor reports:
    • channels.telegram.botToken: unresolved SecretRef "file:filemain:/providers/channels/telegram/botToken"
  • At the same time, runtime Telegram channel is actually healthy in openclaw status (Telegram: ON/OK) and provider starts successfully.
  • A user-level gateway entered a fast restart loop due to invalid config type:
    • channels.telegram.retry.jitter: Invalid input: expected number, received boolean
  • In parallel, openclaw-chromium-cdp.service was in an aggressive restart loop due to profile lock permission errors.

Impact

  • Severe perceived instability / "hang" state.
  • Required manual host reboot by operator.
  • Diagnostics are misleading: doctor SecretRef failure suggests Telegram token resolution issue, while runtime channel works.

Environment

  • OS: Ubuntu 24.04.4 LTS
  • Node: 22.22.1
  • OpenClaw: mixed components around v2026.3.12 / v2026.3.13
  • Channel: Telegram via file SecretRef (file:filemain:/providers/channels/telegram/botToken)
  • Multi-user state dirs present (/home/kaiste/.openclaw, /home/skbot/.openclaw, /root/.openclaw)

Key log excerpts

1) Invalid config causing gateway restart loop

/home/skbot/.openclaw/openclaw.json

  • channels.telegram.retry.jitter: Invalid input: expected number, received boolean

Journal excerpt:

  • Invalid config at /home/skbot/.openclaw/openclaw.json:
  • - channels.telegram.retry.jitter: Invalid input: expected number, received boolean
  • repeated openclaw-gateway.service: Main process exited, status=1/FAILURE

2) CDP restart storm

Repeated:

  • openclaw-chromium-cdp.service: Main process exited, status=21
  • Failed to create /tmp/openclaw-chrome-profile/SingletonLock: Permission denied
  • Scheduled restart job, restart counter is at 8xxx

3) Doctor SecretRef mismatch vs runtime behavior

openclaw doctor:

  • channels.telegram.botToken: unresolved SecretRef ...

But runtime:

  • openclaw status shows Telegram account healthy (ON/OK)
  • provider startup logs exist ([telegram] [default] starting provider ...)

Expected behavior

  1. doctor should not report unresolved SecretRef for channels.telegram.botToken when runtime snapshot resolves and channel is active.
  2. Service restart loops should be rate-limited / circuit-broken with clearer root-cause surfacing.
  3. Migration/validation around channels.telegram.retry.jitter should fail fast with actionable auto-fix guidance before entering restart storm.

Actual behavior

  • Conflicting diagnostics + repeated service restarts created an operationally "hung" state.

Workaround used

  • Fix invalid config value (jitter: true -> numeric, e.g. 0.2).
  • Disable problematic user-level services in secondary runtime (skbot) to stop loops.
  • Keep a single active runtime/user for stability.

Suggested improvements

  • Harden config migration for bool->number in channels.telegram.retry.jitter.
  • Add restart backoff/circuit breaker for gateway and CDP helper services.
  • Improve doctor SecretRef resolution path to align with active runtime snapshot behavior.
  • Add explicit warning when multiple .openclaw state roots are detected and concurrently active.

extent analysis

Fix Plan

To address the issues, we'll focus on the following steps:

  • Fix the invalid config value for channels.telegram.retry.jitter
  • Implement a restart backoff/circuit breaker for gateway and CDP helper services
  • Improve doctor SecretRef resolution to align with active runtime snapshot behavior
  • Add explicit warning for multiple .openclaw state roots

Step-by-Step Solution

  1. Fix invalid config value: Update the openclaw.json file to change the jitter value to a numeric type, e.g., 0.2.

{ "channels": { "telegram": { "retry": { "jitter": 0.2 } } } }

2. **Implement restart backoff/circuit breaker**:
   Introduce a backoff mechanism to prevent aggressive restart loops. This can be achieved using a library like `backoff` in Node.js.
   ```javascript
const backoff = require('backoff');

// Example usage:
const retry = backoff.exponential({
  randomisationFactor: 0.1,
  maxDelay: 30000, // 30 seconds
});

// Wrap service restart logic with retry
retry.retry(() => {
  // Service restart code here
});
  1. Improve doctor SecretRef resolution: Update the doctor command to resolve SecretRefs based on the active runtime snapshot. This may involve modifying the doctor command to query the runtime snapshot directly.

// Pseudo-code example: const runtimeSnapshot = getActiveRuntimeSnapshot(); const secretRef = runtimeSnapshot.getSecretRef('channels.telegram.botToken');

if (secretRef && secretRef.resolved) { console.log('SecretRef resolved successfully'); } else { console.log('SecretRef unresolved'); }

4. **Add explicit warning for multiple `.openclaw` state roots**:
   Introduce a check to detect multiple `.openclaw` state roots and display a warning message.
   ```javascript
const fs = require('fs');
const path = require('path');

// Example usage:
const stateRoots = fs.readdirSync('/home').filter((dir) => fs.existsSync(path.join('/home', dir, '.openclaw')));

if (stateRoots.length > 1) {
  console.warn('Multiple .openclaw state roots detected. This may cause instability.');
}

Verification

To verify the fixes, restart the services and run the openclaw doctor command to ensure that the SecretRef is resolved correctly. Additionally, monitor the services for restart loops and verify that the backoff mechanism is working as expected.

Extra Tips

  • Regularly review and update configuration files to prevent invalid values.
  • Consider implementing automated testing for configuration migrations to catch issues early.
  • Use logging and monitoring

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

  1. doctor should not report unresolved SecretRef for channels.telegram.botToken when runtime snapshot resolves and channel is active.
  2. Service restart loops should be rate-limited / circuit-broken with clearer root-cause surfacing.
  3. Migration/validation around channels.telegram.retry.jitter should fail fast with actionable auto-fix guidance before entering restart storm.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING