openclaw - 💡(How to fix) Fix [Feature]: Add plugin-provided busy-state and lease signals for on-demand container runtimes

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

OpenClaw should provide an official plugin-based busy-state signal that external container lifecycle controllers can use to decide whether an on-demand OpenClaw container should be started, kept alive, extended, held briefly, or released after an idle grace period.

The proposed capability should be implemented through OpenClaw's native plugin mechanism and public Plugin SDK surfaces. It should avoid private core imports and keep the OpenClaw core extension-agnostic.

The final capability should expose the same busy-state snapshot through stable surfaces such as structured logs, authenticated read-only HTTP routes, optional diagnostics, and documentation for external lifecycle controllers.

Root Cause

OpenClaw should provide an official plugin-based busy-state signal that external container lifecycle controllers can use to decide whether an on-demand OpenClaw container should be started, kept alive, extended, held briefly, or released after an idle grace period.

The proposed capability should be implemented through OpenClaw's native plugin mechanism and public Plugin SDK surfaces. It should avoid private core imports and keep the OpenClaw core extension-agnostic.

The final capability should expose the same busy-state snapshot through stable surfaces such as structured logs, authenticated read-only HTTP routes, optional diagnostics, and documentation for external lifecycle controllers.

Code Example

{
  "marker": "openclaw-busy-state",
  "version": 1,
  "event": "snapshot",
  "generatedAt": "2026-05-26T08:00:00.000Z",
  "lastObservedAt": "2026-05-26T07:59:58.000Z",
  "busy": true,
  "confidence": "best_effort",
  "stale": false,
  "reason": ["active_agent_runs", "active_tool_calls"],
  "lease": {
    "recommendedAction": "extend",
    "extendForMs": 300000,
    "idleGraceMs": 60000,
    "wakeRequired": false
  },
  "totals": {
    "active": 2,
    "queued": 0,
    "running": 2,
    "completedRecently": 0,
    "failures": 0,
    "stale": 0
  }
}

---

{ "marker": "openclaw-busy-state", "version": 1 }
RAW_BUFFERClick to expand / collapse

Summary

OpenClaw should provide an official plugin-based busy-state signal that external container lifecycle controllers can use to decide whether an on-demand OpenClaw container should be started, kept alive, extended, held briefly, or released after an idle grace period.

The proposed capability should be implemented through OpenClaw's native plugin mechanism and public Plugin SDK surfaces. It should avoid private core imports and keep the OpenClaw core extension-agnostic.

The final capability should expose the same busy-state snapshot through stable surfaces such as structured logs, authenticated read-only HTTP routes, optional diagnostics, and documentation for external lifecycle controllers.

Problem to solve

OpenClaw can run in container environments where the Gateway process is not guaranteed to stay resident. In this model, an external controller may start the container only when incoming work arrives, and stop it after an idle window to reduce cost and resource usage.

The controller needs a reliable in-process signal while OpenClaw is running:

  • Is OpenClaw currently handling user-visible work?
  • Are agent runs, tool calls, subagents, background tasks, or scheduled jobs active?
  • Is there queued or recently observed work that should keep the container warm?
  • Should the container lease be extended?
  • Is it safe to enter an idle grace period before stopping?

Without this signal, a controller may stop the container while OpenClaw is processing a request, running a subagent, executing a tool call, or finishing background work.

The plugin runs inside the OpenClaw container, so it cannot start a stopped container by itself. Startup must remain the responsibility of an external event source or controller. This proposal focuses on the signal OpenClaw can emit while it is already running.

Proposed solution

Introduce a native plugin, tentatively named openclaw-busy-state, that reports OpenClaw runtime activity as a stable busy/lease snapshot.

The plugin should:

  • Use the official Plugin SDK and documented extension surfaces.
  • Observe runtime activity without importing private src/** internals.
  • Maintain an in-memory view of active, queued, stale, and recently completed work.
  • Use TTL cleanup so missed completion events do not keep the container busy forever.
  • Emit structured lease recommendations that external controllers can consume.
  • Avoid logging sensitive content such as prompts, transcript text, tool arguments, recipients, credentials, or full session identifiers.
  • Leave actual container start, stop, and lease extension decisions to the external controller.

The plugin should track activity from public OpenClaw extension surfaces where available:

  • Interactive agent runs.
  • Tool calls.
  • Subagent runs.
  • Background tasks.
  • Scheduled or cron-triggered tasks.
  • Gateway startup and shutdown lifecycle.
  • Queued work, when exposed by public SDK events.
  • Recently completed work within a short idle grace window.
  • TTL-pruned stale work records.

The busy-state snapshot should include:

  • busy: whether OpenClaw appears to be doing work.
  • confidence: whether the snapshot is authoritative, best-effort, partial, or unknown.
  • stale: whether some tracked work exceeded its TTL.
  • reason: machine-readable reason codes explaining the busy decision.
  • lease.recommendedAction: the controller-facing lease recommendation.
  • totals: aggregate counts for active, queued, running, completed, failed, and stale work.
  • lastObservedAt: timestamp of the latest observed activity.
  • generatedAt: timestamp of the emitted snapshot.

The snapshot should be conservative. If state is stale or incomplete, the plugin should recommend holding the container briefly instead of releasing immediately.

Structured logs should be the baseline controller integration contract. Each busy-state log line should be single-line JSON:

{
  "marker": "openclaw-busy-state",
  "version": 1,
  "event": "snapshot",
  "generatedAt": "2026-05-26T08:00:00.000Z",
  "lastObservedAt": "2026-05-26T07:59:58.000Z",
  "busy": true,
  "confidence": "best_effort",
  "stale": false,
  "reason": ["active_agent_runs", "active_tool_calls"],
  "lease": {
    "recommendedAction": "extend",
    "extendForMs": 300000,
    "idleGraceMs": 60000,
    "wakeRequired": false
  },
  "totals": {
    "active": 2,
    "queued": 0,
    "running": 2,
    "completedRecently": 0,
    "failures": 0,
    "stale": 0
  }
}

External controllers should parse only log lines with:

{ "marker": "openclaw-busy-state", "version": 1 }

Lease recommendations:

  • extend: OpenClaw appears busy; extend the container lease.
  • hold: state is stale, partial, or uncertain; short-extend and wait for another signal.
  • release_after_grace: OpenClaw appears idle; enter idle grace before stopping.
  • unknown: the plugin cannot provide a useful recommendation; controller policy applies.

Suggested default configuration:

  • activeLeaseMs: 300000
  • cautiousLeaseMs: 120000
  • idleGraceMs: 60000
  • logIntervalMs: 15000
  • activityTtlMs: 600000

The target public surfaces should include:

  • Log stream: structured JSON snapshots for simple platform integration.
  • HTTP route: GET /plugins/busy-state/snapshot, authenticated and read-only.
  • HTTP route: GET /plugins/busy-state/lease, authenticated and read-only, returning only controller-facing lease fields.
  • Optional diagnostic tool: busy_state, read-only, disabled or gated if needed.
  • Documentation: controller consumption model, safety rules, sample policies, and known limitations.

Suggested implementation sequence:

  1. Add the native plugin with in-memory state, TTL cleanup, and structured log snapshots.
  2. Add Plugin SDK event coverage for any important runtime activity that cannot be observed through existing public hooks.
  3. Add authenticated read-only HTTP routes that return the same snapshot and lease recommendation.
  4. Add documentation and controller integration examples.
  5. Add optional diagnostic tool or Lobster workflow integration after the core signal is stable.

Each step should remain useful on its own and avoid private core imports.

Alternatives considered

Use OpenClaw core internals directly

Rejected. OpenClaw already has some internal activity and restart-deferral signals, but a plugin should not import private src/** internals. This would create a fragile extension and would not match OpenClaw's plugin boundary direction.

Make the external controller infer activity from process liveness only

Rejected. A live Gateway process does not mean OpenClaw is actively working, and an idle process may still need a short grace period. Process liveness alone cannot distinguish active agent runs, tool calls, subagents, background tasks, stale work, or recently completed work.

Persist plugin state to disk

Not required for the initial design. The lifecycle controller needs current in-process state while the container is running. In-memory state plus TTL is simpler, avoids persistence format churn, and reduces sensitive-data risk.

Let the plugin control the container directly

Rejected. The plugin runs inside the container and should not terminate, suspend, restart, or extend the container by itself. Container startup, stop, and lease extension should remain external controller responsibilities.

Expose only an HTTP endpoint

Rejected as the only integration path. Some platforms can watch logs more easily than they can call into a running Gateway, especially during bootstrap or restricted networking. Structured logs should be the baseline, with authenticated HTTP routes added as a richer read path.

Impact

This would make OpenClaw easier and safer to run in on-demand container environments.

Expected benefits:

  • External lifecycle controllers can avoid stopping OpenClaw while work is still active.
  • Deployments can reduce idle resource cost without requiring OpenClaw to stay resident forever.
  • Container scale-to-zero or lease-based platforms get a stable OpenClaw-native signal instead of scraping ad hoc logs.
  • Operators get a consistent way to reason about active agent runs, tool calls, subagents, background tasks, scheduled tasks, stale work, and idle grace periods.
  • Plugin authors and deployment integrators get a public contract instead of relying on private runtime internals.

The proposal should not add container-platform-specific behavior to core. It should keep OpenClaw extension-agnostic and let external controllers decide the actual lifecycle policy.

Security and privacy impact:

  • Busy-state logs and API responses must not include prompts, transcript content, tool arguments, recipients, credentials, or full session identifiers.
  • HTTP routes must be authenticated and read-only.
  • The plugin should not mutate runtime state, cancel tasks, or control the container directly.

Acceptance criteria:

  • The plugin loads through the official OpenClaw native plugin mechanism.
  • The plugin uses only public Plugin SDK surfaces or newly documented additive SDK events.
  • The plugin reports active OpenClaw work through a versioned busy-state snapshot.
  • The plugin tracks active, queued, stale, and recently completed work where public events allow it.
  • TTL cleanup prevents missed completion events from causing permanent busy state.
  • Structured JSON logs include marker: "openclaw-busy-state" and version: 1.
  • Busy state recommends lease.recommendedAction="extend".
  • Idle state recommends lease.recommendedAction="release_after_grace".
  • Stale, partial, or uncertain state recommends lease.recommendedAction="hold".
  • HTTP snapshot and lease routes are authenticated, read-only, and versioned.
  • All public surfaces share one snapshot model.
  • Logs and API responses contain no sensitive user, provider, credential, or transcript content.
  • Documentation explains controller responsibilities and the boundary between OpenClaw and the external lifecycle manager.

Evidence/examples

This proposal is based on OpenClaw's existing plugin architecture and container/runtime behavior:

  • OpenClaw already supports native plugins and public Plugin SDK surfaces.
  • Runtime activity exists across several categories that matter to container lifecycle decisions: interactive agent runs, tool calls, subagents, background tasks, scheduled tasks, and Gateway lifecycle.
  • Some activity is visible through public plugin hooks today, while other activity may require small additive SDK events to avoid private core imports.
  • OpenClaw can run in Docker or other containerized deployments where keeping the Gateway resident forever is not always desirable.
  • External lifecycle systems commonly need a simple busy/idle or lease-extension signal to make safe scale-down decisions.

Example controller behavior:

  1. Incoming external event starts or wakes the OpenClaw container.
  2. openclaw-busy-state begins emitting structured snapshots while the Gateway is running.
  3. The controller watches log lines with marker: "openclaw-busy-state" and version: 1.
  4. If lease.recommendedAction is extend, the controller extends the lease.
  5. If the recommendation is hold, the controller short-extends and waits for the next signal.
  6. If the recommendation is release_after_grace, the controller starts an idle grace timer before stopping the container.

Related open issues:

  • #14051: Activity-based heartbeat with idle timeout. Related to activity and idle detection, but this proposal targets external container lifecycle lease signals through a plugin-provided busy-state snapshot.
  • #12429: Secure pairing for dynamically autoscaled worker nodes in K8s. Related to autoscaled runtime environments, but this proposal focuses on whether a running OpenClaw Gateway should be kept alive or released.
  • #85768: Intermittent workspace-sandbox prep hangs on cold-start. Related to cold-start behavior, but this proposal does not address sandbox preparation latency directly.
  • #86199: TUI Esc abort can leave stale optimistic busy state. Related terminology, but this proposal concerns runtime-wide busy/idle state for external lifecycle controllers rather than local TUI state.
  • #74684: sessions_spawn does not expose child process PID for running ACP sessions. Related to spawned-session observability, but this proposal focuses on aggregate busy/lease state rather than process-level debugging.
  • #80219: Plugin SDK surface consolidation. Relevant background for deciding whether new Plugin SDK runtime activity events should be added, if current public hooks are insufficient.

Additional information

Non-goals:

  • No core special cases for a specific container platform.
  • No private imports from OpenClaw core internals.
  • No persisted plugin state as a requirement.
  • No transcript, prompt, tool argument, message body, recipient, credential, or full session key in logs or API responses.
  • No concrete container controller implementation inside OpenClaw.
  • No direct container lifecycle mutation from the plugin.
  • No task cancellation or runtime management behavior inside the plugin.

Container boundary:

  • If the container is stopped, the plugin cannot emit state.
  • Container startup must be triggered by an external event source or controller.
  • Container lease extension and release are controller responsibilities.
  • The plugin should never terminate, suspend, or restart the container by itself.
  • The plugin should not cancel OpenClaw tasks or mutate runtime activity.

Open questions:

  • Which current Plugin SDK hooks are sufficient to observe normal interactive agent runs across all Gateway paths?
  • Which background task, subagent, and scheduled-task events need new additive SDK coverage?
  • Should confidence include both best_effort and authoritative, or should the first version only expose conservative best-effort state?
  • What default TTL and idle grace values fit common container platforms?
  • Should the HTTP routes be enabled by default, or opt-in through plugin configuration?
  • Should lease snapshots be emitted through the plugin logger, console.log, or both?

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Feature]: Add plugin-provided busy-state and lease signals for on-demand container runtimes