openclaw - ✅(Solved) Fix [Bug]: openclaw cron list/status and openclaw health --json timeout against local gateway while scheduler still appears to run jobs [1 pull requests, 2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#51498Fetched 2026-04-08 01:10:20
View on GitHub
Comments
2
Participants
3
Timeline
6
Reactions
0
Timeline (top)
commented ×2labeled ×2cross-referenced ×1referenced ×1

openclaw cron list/status and openclaw health --json timeout against local gateway while scheduler still appears to run jobs

Error Message

Error: gateway timeout after 30000ms Gateway target: ws://127.0.0.1:18789 Source: local loopback Config: /home/erikadmin/.openclaw/openclaw.json Bind: loopback

Root Cause

openclaw cron list/status and openclaw health --json timeout against local gateway while scheduler still appears to run jobs

Fix Action

Fixed

PR fix notes

PR #51515: fix(health): bound gateway health snapshots and normalize legacy cron

Description (problem / solution / changelog)

Summary

Describe the problem and fix in 2–5 bullets:

  • Problem: openclaw cron list / cron status and openclaw health --json could hit gateway timeouts (~30s) while gateway status still showed a healthy RPC probe; users with many channel accounts also paid a sequential health snapshot cost (N×probe timeout). Legacy cron rows could have non-string or empty id values.
  • Why it matters: Admins lose the ability to inspect cron and gateway health via CLI despite a running scheduler; multi-account setups amplified health snapshot latency.
  • What changed: getHealthSnapshot now runs per-channel account probes in parallel, applies a default wall-clock budget (DEFAULT_HEALTH_SNAPSHOT_BUDGET_MS), and passes a per-probe budget derived from remaining time; normalizeStoredCronJobs coerces numeric id to string and assigns a UUID when id is missing/empty after legacy migration.
  • What did NOT change (scope boundary): No change to cron execution semantics, gateway auth, or unrelated channel plugins beyond health snapshot gathering and cron store normalization on load.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #51498
  • Related #

User-visible / Behavior Changes

  • health RPC / openclaw health snapshots complete within a bounded time more reliably on multi-account configs; probes for accounts under one channel run concurrently.
  • Legacy cron jobs with numeric or missing id are normalized when the store is loaded/saved (may persist a one-time rewrite).

Security Impact (required)

  • New permissions/capabilities? (No)
  • Secrets/tokens handling changed? (No)
  • New/changed network calls? (No — same probes, different scheduling/budget)
  • Command/tool execution surface changed? (No)
  • Data access scope changed? (No)

Repro + Verification

Environment

  • OS: any (fix is Node/TS in gateway + CLI)
  • Runtime: OpenClaw gateway + CLI calling health / cron.* RPCs
  • Relevant config: multiple channel accounts or many plugins increases snapshot work; cron store with legacy id / jobId

Steps

  1. Run pnpm check locally on the branch (passes).
  2. (Optional) Run gateway and exercise openclaw health --json / openclaw cron list against a multi-account config.

Expected

Health snapshot and cron admin RPCs return before client timeout in typical setups; cron store normalizes legacy ids without manual doctor intervention.

Actual

Local pnpm check passed; unit coverage in health.snapshot.test.ts / store-migration.test.ts.

Evidence

  • Failing pattern addressed: sequential health probes + unbounded total time; missing cron id coercion — covered by code change + tests
  • pnpm check green locally

Human Verification (required)

  • Verified scenarios: pnpm check full suite; targeted tests for health snapshot and cron migration.
  • Edge cases checked: parallel account probes preserve per-account error handling; budget exceeded returns structured probe error string.
  • What you did not verify: End-to-end on the reporter’s exact Linux systemd gateway host.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? (Yes)
  • Config/env changes? (No)
  • Migration needed? (No — cron ids are normalized on load; optional persist on next write)

Failure Recovery (if this breaks)

  • How to disable/revert: revert this commit.
  • Files/config to restore: src/commands/health.ts, src/cron/store-migration.ts, tests, CHANGELOG.md

Risks and Mitigations

  • Risk: Parallel probes could increase concurrent outbound requests to providers. Mitigation: Same probes as before, only concurrency per channel; overall budget caps total wait.
  • Risk: UUID assignment for empty ids changes stable ids for broken rows. Mitigation: Only when id was unusable; improves correctness.

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • src/commands/health.ts (modified, +97/-68)
  • src/cron/store-migration.test.ts (modified, +26/-0)
  • src/cron/store-migration.ts (modified, +17/-0)

Code Example

Error: gateway timeout after 30000ms
Gateway target: ws://127.0.0.1:18789
Source: local loopback
Config: /home/erikadmin/.openclaw/openclaw.json
Bind: loopback
RAW_BUFFERClick to expand / collapse

Bug type

Regression (worked before, now fails)

Summary

openclaw cron list/status and openclaw health --json timeout against local gateway while scheduler still appears to run jobs

Steps to reproduce

  1. Run openclaw gateway status on a Linux host with a user-level systemd gateway.
  2. Confirm the gateway reports healthy output and RPC probe: ok.
  3. Run openclaw cron list.
  4. Run openclaw cron status.
  5. Run openclaw health --json.
  6. Observe that all of the admin-facing commands time out, while existing cron jobs still appear to run and write run-history under ~/.openclaw/cron/runs/.
  7. Run openclaw gateway install --force and retry.
  8. Run openclaw doctor --fix and retry.

Expected behavior

openclaw cron list, openclaw cron status, and openclaw health --json should return normally when the local gateway is running and openclaw gateway status reports RPC probe: ok. If legacy cron-store fields are the issue, openclaw doctor --fix should normalize them or at least improve the situation.

Actual behavior

openclaw gateway status reports a healthy local gateway with RPC probe: ok, but:

  • openclaw cron list times out
  • openclaw cron status times out
  • openclaw health --json times out
  • Existing cron jobs still appear to run and write run-history, so the scheduler seems alive while the admin/control plane is not.

I also found that jobs in ~/.openclaw/cron/jobs.json use legacy id fields and have jobId = None. Running openclaw gateway install --force did not fix it. Running openclaw doctor --fix did not fix it.

OpenClaw version

2026.3.13 (61d171a)

Operating system

Ubuntu Linux 6.8.0-100-generic (x64), user-level systemd gateway

Install method

No response

Model

openai-codex/gpt-5.4

Provider / routing chain

OpenAI Codex OAuth / local gateway / user-level systemd service

Additional provider/model setup details

No response

Logs, screenshots, and evidence

Error: gateway timeout after 30000ms
Gateway target: ws://127.0.0.1:18789
Source: local loopback
Config: /home/erikadmin/.openclaw/openclaw.json
Bind: loopback

Impact and severity

Affected users/systems/channels: Observed on one Linux host using a local user-level systemd Gateway. The affected subsystem is OpenClaw cron administration via the CLI (openclaw cron list, openclaw cron status) and openclaw health --json. Existing cron jobs still appear to run, so the scheduler/runtime is at least partially functioning.

Severity: Blocks workflow. It prevents safe inspection and administration of cron jobs even though the scheduler appears to remain active.

Frequency: Always, in this environment. The timeout reproduces consistently across repeated attempts before and after openclaw gateway install --force and openclaw doctor --fix.

Consequence: Cannot reliably inspect, add, modify, or remove cron jobs via the normal CLI workflow. This leaves the system in a degraded state where scheduled automations may continue to run, but the admin/control plane is effectively unavailable.

Additional information

  • Gateway bind is loopback (127.0.0.1:18789)
  • openclaw gateway status is healthy
  • Existing cron jobs still appear to execute
  • jobs.json is valid, small (~33K), version 1, and stores jobs with legacy id and no jobId
  • openclaw doctor --fix only cleaned orphan transcript files and did not change the cron behavior

extent analysis

Fix Plan

To resolve the issue with openclaw cron list, openclaw cron status, and openclaw health --json timing out, we will:

  • Update the jobs.json file to use the new jobId field instead of the legacy id field.
  • Implement a retry mechanism for the gateway connection to handle temporary timeouts.

Code Changes

import json

# Load the jobs.json file
with open('~/.openclaw/cron/jobs.json', 'r') as f:
    jobs = json.load(f)

# Update the jobs to use the new jobId field
for job in jobs:
    job['jobId'] = job['id']
    del job['id']

# Save the updated jobs.json file
with open('~/.openclaw/cron/jobs.json', 'w') as f:
    json.dump(jobs, f)

Configuration Changes

  • No configuration changes are required.

Infra / Dependency Fixes

  • No infra or dependency fixes are required.

Temporary Workarounds

  • If the issue persists, try increasing the timeout value in the openclaw.json configuration file.

Verification

To verify that the fix worked:

  1. Run openclaw cron list and check that it returns normally.
  2. Run openclaw cron status and check that it returns normally.
  3. Run openclaw health --json and check that it returns normally.

Extra Tips

  • Make sure to backup the jobs.json file before making any changes.
  • If the issue persists, try running openclaw doctor --fix again to clean up any orphaned files.
  • Consider implementing a regular backup and update process for the jobs.json file to prevent similar issues in the future.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

openclaw cron list, openclaw cron status, and openclaw health --json should return normally when the local gateway is running and openclaw gateway status reports RPC probe: ok. If legacy cron-store fields are the issue, openclaw doctor --fix should normalize them or at least improve the situation.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING