openclaw - ✅(Solved) Fix orchestrator store: in_progress tasks have no expiry path [3 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#72095Fetched 2026-04-27 05:34:58
View on GitHub
Comments
1
Participants
2
Timeline
4
Reactions
0
Timeline (top)
cross-referenced ×2closed ×1commented ×1

Found during bot review of #72086 (Greptile P2). The STALE_ELIGIBLE set in extensions/orchestrator/src/store.ts excludes `in_progress`:

```ts const STALE_ELIGIBLE: ReadonlySet<TaskState> = new Set<TaskState>([ "queued", "assigned", "awaiting_approval", ]); ```

`sweepExpired` skips any task in `in_progress`, and `applyAction` rejects `expire` for that state. Combined with `spawn-watch` having no built-in timeout, this creates a category of tasks that can never be reclaimed if a specialist session dies silently in shadow/live mode.

Root Cause

Found during bot review of #72086 (Greptile P2). The STALE_ELIGIBLE set in extensions/orchestrator/src/store.ts excludes `in_progress`:

```ts const STALE_ELIGIBLE: ReadonlySet<TaskState> = new Set<TaskState>([ "queued", "assigned", "awaiting_approval", ]); ```

`sweepExpired` skips any task in `in_progress`, and `applyAction` rejects `expire` for that state. Combined with `spawn-watch` having no built-in timeout, this creates a category of tasks that can never be reclaimed if a specialist session dies silently in shadow/live mode.

Fix Action

Fix / Workaround

Phase B ships in synthetic mode only (`mode: "synthetic"`), which transitions tasks through `in_progress` synchronously inside `dispatch.ts` and lands them at `done` / `awaiting_approval` / `failed` in the same call. There is no real `spawn`, so no real liveness gap.

PR fix notes

PR #72086: feat(orchestrator): Phase B Units 10 + 11 — synthetic gate, expiry sweeper, shadow-summary, live-flip runbook (depends on #72068)

Description (problem / solution / changelog)

Summary

Closes the Phase B implementation arc. Three deliverables stacked on #72068:

  1. R30 synthetic-task observability gate — `openclaw orchestrator synthetic-all` runs a 5-fixture deterministic harness end-to-end through routing + store + dispatch + trajectory. Exits non-zero if any fixture diverges from its expected agent / rule / terminal state. The gate is the operator-facing precondition for ever flipping mode away from synthetic.
  2. Expiry sweeper service — `createExpirySweeper` registered via `api.registerService(...)` with the `stop` lifecycle hook on the service object (recon Q-3 settled — `OpenClawPluginService` is the right model for periodic gateway-resident work). Fires every 60 minutes by default; logs swept count.
  3. Shadow-summary CLI verb + live-flip runbook — `openclaw orchestrator shadow-summary [--window 24]` reads the shadow archive, prints by-state counts + mean duration + window span, and exits non-zero if any spawn failure landed inside the window. The README's new "Live-flip runbook" makes the synthetic → shadow → live transition explicit.

Stacked PR — depends on #72068 → #72054 → #72039 → #72029. Land in that order; this PR's diff narrows after each merge.

What landed

CommitSub-unitFiles
`5de1c34002`Units 10 + 11 main`src/synthetic.ts` (deterministic harness + fixture loader + result formatter), `test/fixtures/synthetic-tasks.json` (5 fixtures: code-1, ops-1, research-1, writing-1, fallback-1), `src/expiry-sweeper.ts` (start/stop/runOnce + crash-tolerant logging), `src/shadow-summary.ts` (windowed shadow stats), three new CLI verbs in `src/cli.ts`, README runbook
follow-up fixUnit 11 service shapeAligns the registered service with `OpenClawPluginService.stop?` (object-level, not return-from-start), and hardens one spawn-watch test against TS narrowing weirdness around closure-captured `let`

CLI surface (cumulative across Units 7-11)

VerbPurpose
`openclaw orchestrator init`Generate bearer token (Unit 7).
`openclaw orchestrator rotate-token`Rotate bearer token (Unit 7).
`openclaw orchestrator synthetic <label>`One synthetic fixture end-to-end.
`openclaw orchestrator synthetic-all`Full synthetic harness (R30 gate).
`openclaw orchestrator shadow-summary [--window <hours>]`Shadow archive stats + live-flip gate.

Live-flip procedure (now documented in README)

```

  1. openclaw orchestrator init # bearer token
  2. openclaw orchestrator synthetic-all # data-plane gate
  3. mode = "shadow" in ~/.openclaw/openclaw.json # 24h soak
  4. openclaw orchestrator shadow-summary --window 24 # live-flip gate
  5. mode = "live" # production ```

Rollback at any point is a config edit + restart; in-flight `awaiting_approval` tasks remain operator-actionable from the Approvals tab.

Boundaries respected

  • `synthetic.ts`, `expiry-sweeper.ts`, `shadow-summary.ts` import only Node built-ins, types from `./types/schema.ts`, and the previously-shipped `store.ts` / `routing.ts` / `dispatch.ts` / `trajectory.ts`. No `src/**` imports.
  • The expiry sweeper is the first time this extension uses `api.registerService`; the service shape matches `src/plugins/types.ts:1996-1999` with separate top-level `start` and `stop` hooks (not `stop` returned from `start`).

Test plan

  • `pnpm test extensions/orchestrator` — 172/172 pass (16 files; +30 new tests across synthetic, expiry-sweeper, shadow-summary)
  • `pnpm tsgo:all` — clean
  • Boundary contract still passing
  • `openclaw orchestrator synthetic-all` exits 0 when all fixtures pass; non-zero with structured reasons when any fail
  • `openclaw orchestrator shadow-summary` exits non-zero when any task in window is in `failed`
  • CI green
  • Live-flip dry run on Peter's machine after PR-set lands

Phase B is now feature-complete

This is the last openclaw-side unit in Plan 005. After this stack lands and the MC commits ship:

  • Synthetic mode produces visible task records with full trajectory in the Pipeline tab
  • Approvals tab handles approve/reject for synthetic awaiting_approval tasks
  • Shadow + live modes are gated behind the documented runbook
  • The expiry sweeper keeps the task archive bounded

🤖 Generated with Claude Code

Changed files

  • .github/labeler.yml (modified, +4/-0)
  • extensions/orchestrator/README.md (added, +109/-0)
  • extensions/orchestrator/RECON-NOTES.md (added, +242/-0)
  • extensions/orchestrator/index.ts (added, +141/-0)
  • extensions/orchestrator/install/agent-template/IDENTITY.md (added, +44/-0)
  • extensions/orchestrator/install/agent-template/README.md (added, +15/-0)
  • extensions/orchestrator/openclaw.plugin.json (added, +20/-0)
  • extensions/orchestrator/package.json (added, +30/-0)
  • extensions/orchestrator/src/agent-capabilities.ts (added, +92/-0)
  • extensions/orchestrator/src/cli.ts (added, +159/-0)
  • extensions/orchestrator/src/credentials.ts (added, +157/-0)
  • extensions/orchestrator/src/dispatch.ts (added, +205/-0)
  • extensions/orchestrator/src/expiry-sweeper.ts (added, +92/-0)
  • extensions/orchestrator/src/fixtures/synthetic-tasks.json (added, +40/-0)
  • extensions/orchestrator/src/http.ts (added, +389/-0)
  • extensions/orchestrator/src/inbox.ts (added, +117/-0)
  • extensions/orchestrator/src/install.ts (added, +85/-0)
  • extensions/orchestrator/src/routing.config-default.ts (added, +77/-0)
  • extensions/orchestrator/src/routing.ts (added, +245/-0)
  • extensions/orchestrator/src/shadow-summary.ts (added, +100/-0)
  • extensions/orchestrator/src/spawn-watch.ts (added, +193/-0)
  • extensions/orchestrator/src/store.paths.ts (added, +59/-0)
  • extensions/orchestrator/src/store.ts (added, +562/-0)
  • extensions/orchestrator/src/synthetic.ts (added, +179/-0)
  • extensions/orchestrator/src/trajectory.ts (added, +147/-0)
  • extensions/orchestrator/src/types/schema.contract.ts (added, +22/-0)
  • extensions/orchestrator/src/types/schema.ts (added, +179/-0)
  • extensions/orchestrator/test/agent-capabilities.test.ts (added, +72/-0)
  • extensions/orchestrator/test/cli.test.ts (added, +107/-0)
  • extensions/orchestrator/test/credentials.test.ts (added, +142/-0)
  • extensions/orchestrator/test/dispatch-trajectory.test.ts (added, +157/-0)
  • extensions/orchestrator/test/dispatch.test.ts (added, +185/-0)
  • extensions/orchestrator/test/expiry-sweeper.test.ts (added, +118/-0)
  • extensions/orchestrator/test/fixtures/routing.malformed.json (added, +22/-0)
  • extensions/orchestrator/test/fixtures/routing.shadowing.json (added, +22/-0)
  • extensions/orchestrator/test/http.test.ts (added, +525/-0)
  • extensions/orchestrator/test/inbox.test.ts (added, +139/-0)
  • extensions/orchestrator/test/install.test.ts (added, +71/-0)
  • extensions/orchestrator/test/routing.test.ts (added, +237/-0)
  • extensions/orchestrator/test/schema-hash.test.ts (added, +32/-0)
  • extensions/orchestrator/test/shadow-summary.test.ts (added, +122/-0)
  • extensions/orchestrator/test/spawn-watch.test.ts (added, +232/-0)
  • extensions/orchestrator/test/store.test.ts (added, +324/-0)
  • extensions/orchestrator/test/synthetic.test.ts (added, +179/-0)
  • extensions/orchestrator/test/trajectory.test.ts (added, +189/-0)
  • extensions/orchestrator/tsconfig.json (added, +16/-0)
  • pnpm-lock.yaml (modified, +9/-0)

PR #72091: test(orchestrator): cross-repo wire-contract fixture + HTTP→dispatch→approval integration test (depends on #72086)

Description (problem / solution / changelog)

Summary

Final test layer for Phase B orchestrator. Two complementary tests:

  1. Cross-repo wire-contract (recon A-S2). A canonical JSON fixture at extensions/orchestrator/test/fixtures/wire-contract.json covering all task states, all task kinds, and all canonical trajectory event types. SHA-256-pinned in wire-contract.sha256. The same fixture and hash will be mirrored byte-for-byte in MissionControl at __tests__/contract/wire-contract.json. If either repo's schema drifts without lockstep update, the local test fails — CI-time sentinel against cross-repo divergence.

  2. HTTP → dispatch → approval integration test. End-to-end exercise of the openclaw data plane with no mocks of the routing engine, store, recorder, or dispatch path. Uses real createStore, real getRecorder, real loadConfig, real dispatchTask. Only IncomingMessage/ServerResponse are faked via EventEmitter to avoid a real socket. Covers the seven canonical lifecycle paths:

    • approval-required submit → awaiting_approval → approve → done
    • non-approval submit → done in one shot
    • reject records both rejection.* and error.*
    • list filters by state and kind
    • routing/preview is pure (no store / trajectory writes)
    • live mode rejects POST submission with 403 LIVE_DISABLED
    • approve on queued returns 409 invalid_transition without mutating

Test plan

  • pnpm test extensions/orchestrator — 185/185 across 18 files
  • tsgo exits 0
  • Hash recomputed and pinned

🤖 Generated with Claude Code

Changed files

  • .github/labeler.yml (modified, +4/-0)
  • extensions/orchestrator/README.md (added, +109/-0)
  • extensions/orchestrator/RECON-NOTES.md (added, +242/-0)
  • extensions/orchestrator/index.ts (added, +141/-0)
  • extensions/orchestrator/install/agent-template/IDENTITY.md (added, +44/-0)
  • extensions/orchestrator/install/agent-template/README.md (added, +15/-0)
  • extensions/orchestrator/openclaw.plugin.json (added, +20/-0)
  • extensions/orchestrator/package.json (added, +30/-0)
  • extensions/orchestrator/src/agent-capabilities.ts (added, +92/-0)
  • extensions/orchestrator/src/cli.ts (added, +159/-0)
  • extensions/orchestrator/src/credentials.ts (added, +157/-0)
  • extensions/orchestrator/src/dispatch.ts (added, +205/-0)
  • extensions/orchestrator/src/expiry-sweeper.ts (added, +92/-0)
  • extensions/orchestrator/src/fixtures/synthetic-tasks.json (added, +40/-0)
  • extensions/orchestrator/src/http.ts (added, +389/-0)
  • extensions/orchestrator/src/inbox.ts (added, +117/-0)
  • extensions/orchestrator/src/install.ts (added, +85/-0)
  • extensions/orchestrator/src/routing.config-default.ts (added, +77/-0)
  • extensions/orchestrator/src/routing.ts (added, +245/-0)
  • extensions/orchestrator/src/shadow-summary.ts (added, +100/-0)
  • extensions/orchestrator/src/spawn-watch.ts (added, +193/-0)
  • extensions/orchestrator/src/store.paths.ts (added, +59/-0)
  • extensions/orchestrator/src/store.ts (added, +562/-0)
  • extensions/orchestrator/src/synthetic.ts (added, +179/-0)
  • extensions/orchestrator/src/trajectory.ts (added, +147/-0)
  • extensions/orchestrator/src/types/schema.contract.ts (added, +22/-0)
  • extensions/orchestrator/src/types/schema.ts (added, +179/-0)
  • extensions/orchestrator/test/agent-capabilities.test.ts (added, +72/-0)
  • extensions/orchestrator/test/cli.test.ts (added, +107/-0)
  • extensions/orchestrator/test/credentials.test.ts (added, +142/-0)
  • extensions/orchestrator/test/dispatch-trajectory.test.ts (added, +157/-0)
  • extensions/orchestrator/test/dispatch.test.ts (added, +185/-0)
  • extensions/orchestrator/test/expiry-sweeper.test.ts (added, +118/-0)
  • extensions/orchestrator/test/fixtures/routing.malformed.json (added, +22/-0)
  • extensions/orchestrator/test/fixtures/routing.shadowing.json (added, +22/-0)
  • extensions/orchestrator/test/fixtures/wire-contract.json (added, +258/-0)
  • extensions/orchestrator/test/fixtures/wire-contract.sha256 (added, +1/-0)
  • extensions/orchestrator/test/http.test.ts (added, +525/-0)
  • extensions/orchestrator/test/inbox.test.ts (added, +139/-0)
  • extensions/orchestrator/test/install.test.ts (added, +71/-0)
  • extensions/orchestrator/test/integration.http-dispatch.test.ts (added, +354/-0)
  • extensions/orchestrator/test/routing.test.ts (added, +237/-0)
  • extensions/orchestrator/test/schema-hash.test.ts (added, +32/-0)
  • extensions/orchestrator/test/shadow-summary.test.ts (added, +122/-0)
  • extensions/orchestrator/test/spawn-watch.test.ts (added, +232/-0)
  • extensions/orchestrator/test/store.test.ts (added, +324/-0)
  • extensions/orchestrator/test/synthetic.test.ts (added, +179/-0)
  • extensions/orchestrator/test/trajectory.test.ts (added, +189/-0)
  • extensions/orchestrator/test/wire-contract.test.ts (added, +119/-0)
  • extensions/orchestrator/tsconfig.json (added, +16/-0)
  • pnpm-lock.yaml (modified, +9/-0)
RAW_BUFFERClick to expand / collapse

Context

Found during bot review of #72086 (Greptile P2). The STALE_ELIGIBLE set in extensions/orchestrator/src/store.ts excludes `in_progress`:

```ts const STALE_ELIGIBLE: ReadonlySet<TaskState> = new Set<TaskState>([ "queued", "assigned", "awaiting_approval", ]); ```

`sweepExpired` skips any task in `in_progress`, and `applyAction` rejects `expire` for that state. Combined with `spawn-watch` having no built-in timeout, this creates a category of tasks that can never be reclaimed if a specialist session dies silently in shadow/live mode.

Why this is dormant in v0

Phase B ships in synthetic mode only (`mode: "synthetic"`), which transitions tasks through `in_progress` synchronously inside `dispatch.ts` and lands them at `done` / `awaiting_approval` / `failed` in the same call. There is no real `spawn`, so no real liveness gap.

The gap opens when shadow/live ships and `spawn-watch` becomes the path that drives the `in_progress → done | failed` transition asynchronously.

Decision needed before shadow/live cutover

Two reasonable approaches:

  1. Watcher-side TTL. `spawn-watch` adopts a max-watch duration; when it expires, it emits a synthetic `subagent_failed` outcome and lets the existing reject path fire. Keeps the store contract simple — `in_progress` only exits via terminal events, never via the sweeper.
  2. `force_expire` action + `STALE_ELIGIBLE` includes `in_progress` past a longer threshold. The sweeper grows a second tier (e.g., `expiresAt + 6h`) for `in_progress` tasks so silent-death sessions are reclaimed even if the watcher itself crashed.

Option 1 covers the common failure mode (specialist crash); Option 2 also covers watcher crash. Both can coexist.

Suggested scope

  • Pick approach in design discussion before the synthetic → shadow flip.
  • Land before any real `spawn-watch` traffic.

🤖 Generated with Claude Code

extent analysis

TL;DR

Implement a watcher-side TTL or add a force_expire action to handle tasks stuck in the in_progress state due to silent specialist session deaths.

Guidance

  • Consider implementing a watcher-side TTL to emit a synthetic subagent_failed outcome when the max-watch duration expires, allowing the existing reject path to fire.
  • Alternatively, add a force_expire action and modify the STALE_ELIGIBLE set to include in_progress tasks past a longer threshold (e.g., expiresAt + 6h) to reclaim silent-death sessions.
  • Evaluate both options in a design discussion before the synthetic → shadow flip to determine the best approach.
  • Ensure the chosen solution is landed before any real spawn-watch traffic to prevent tasks from becoming stuck in the in_progress state.

Example

// Example of watcher-side TTL implementation
const maxWatchDuration = 30 * 60 * 1000; // 30 minutes
const watcher = spawnWatch(task);
watcher.on('timeout', () => {
  emitSyntheticSubagentFailedOutcome(task);
});

Notes

The choice between the two approaches depends on the specific requirements and constraints of the system. Implementing a watcher-side TTL may be simpler, but adding a force_expire action provides more flexibility in handling stuck tasks.

Recommendation

Apply workaround by implementing a watcher-side TTL, as it covers the common failure mode of specialist crashes and is a more straightforward solution to implement.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix orchestrator store: in_progress tasks have no expiry path [3 pull requests, 1 comments, 2 participants]