openclaw - ✅(Solved) Fix Bug: benign SIGTERM async completions surface into chat after restart [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#72252Fetched 2026-04-27 05:32:31
View on GitHub
Comments
1
Participants
2
Timeline
3
Reactions
0
Author
Timeline (top)
closed ×1commented ×1cross-referenced ×1

Root Cause

  • Telegram DM received: "The async command failed because it was terminated with SIGTERM."
  • The attached output was only OpenClaw CLI help text.
  • The command had been stopped during restart cleanup, not because the user's workflow failed.

Fix Action

Fixed

PR fix notes

PR #72253: fix(heartbeat): keep benign exec completions internal

Description (problem / solution / changelog)

Summary

Describe the problem and fix in 2–5 bullets:

  • Problem: benign async exec completions, including restart-cleanup signal SIGTERM, can be relayed into user chat as noisy failure summaries.
  • Why it matters: routine gateway/session restarts can produce Telegram/WhatsApp messages that look like actionable command failures even when there is no user-facing problem.
  • What changed: successful structured exec completions and structured SIGTERM cleanup completions are classified as internal-only, suppress visible HEARTBEAT_OK, and use normal ack skipping.
  • What did NOT change (scope boundary): real non-SIGTERM failures such as Exec failed (..., code 1) :: build failed still relay to the user.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #72252
  • Related #69492
  • Related #67273
  • Related #72217
  • Related #72218
  • Related #71213
  • This PR fixes a bug or regression

Root Cause (if applicable)

  • Root cause: heartbeat treated every exec completion event as user-relayable, and exec completion handling intentionally bypassed normal HEARTBEAT_OK ack skipping so results would not be lost.
  • Missing detection / guardrail: there was no distinction between actionable exec failures and non-actionable structured completions/cleanup terminations.
  • Contributing context (if known): this is a narrower tactical fix for one noisy path while the broader per-event audience classification work remains tracked separately.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: src/infra/heartbeat-events-filter.test.ts, src/infra/heartbeat-runner.returns-default-unset.test.ts
  • Scenario the test should lock in: code 0 and signal SIGTERM structured exec completions stay internal and do not send chat output; code 1 failures still relay.
  • Why this is the smallest reliable guardrail: the bug is in heartbeat event classification and delivery suppression, so direct prompt and runner tests cover the decision seam without depending on full gateway e2e timing.
  • Existing test that already covers this (if any): existing exec completion tests covered relay paths but not benign/internal-only structured completions.
  • If no new test is added, why not: N/A

User-visible / Behavior Changes

Benign async command completions and restart-cleanup SIGTERM events no longer produce visible Telegram/WhatsApp/chat messages. Real async command failures continue to notify users.

Diagram (if applicable)

Before:
[restart cleanup SIGTERM] -> [exec completion heartbeat] -> [visible chat failure summary]

After:
[restart cleanup SIGTERM] -> [internal heartbeat handling] -> [no visible chat noise]

Security Impact (required)

  • New permissions/capabilities? (Yes/No) No
  • Secrets/tokens handling changed? (Yes/No) No
  • New/changed network calls? (Yes/No) No
  • Command/tool execution surface changed? (Yes/No) No
  • Data access scope changed? (Yes/No) No
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: Linux
  • Runtime/container: Node v24.15.0, pnpm
  • Model/provider: N/A
  • Integration/channel (if any): heartbeat async exec delivery to chat channels
  • Relevant config (redacted): heartbeat delivery enabled; async exec completion events queued

Steps

  1. Queue Exec completed (abc12345, code 0) :: backup complete for a heartbeat session.
  2. Queue Exec failed (abc12345, signal SIGTERM) :: openclaw gateway help text for a heartbeat session.
  3. Queue Exec failed (abc12345, code 1) :: build failed for a heartbeat session.

Expected

  • code 0 and signal SIGTERM structured completions are handled internally and do not send visible chat messages.
  • code 1 structured failures still relay to the configured chat target.

Actual

  • Before this change, all exec completions were treated as user-relay completions, so benign completions could surface as visible chat noise.
  • After this change, only actionable exec failures remain relayable.

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios:
    • node scripts/test-projects.mjs src/infra/heartbeat-events-filter.test.ts src/infra/heartbeat-runner.returns-default-unset.test.ts passed.
    • pnpm exec oxfmt --check --threads=1 src/infra/heartbeat-events-filter.ts src/infra/heartbeat-events-filter.test.ts src/infra/heartbeat-runner.ts src/infra/heartbeat-runner.returns-default-unset.test.ts passed.
    • git diff --check passed.
    • pnpm build passed.
  • Edge cases checked:
    • Successful structured exec completion stays internal.
    • signal SIGTERM structured exec completion stays internal.
    • Non-SIGTERM code 1 structured failure still relays.
    • Mixed internal success plus real failure filters the success output and relays the real failure only.
  • What you did not verify:
    • pnpm check:changed did not complete cleanly locally because test/gateway.multi.e2e.test.ts repeatedly timed out waiting for an unrelated final chat event after the required dist/index.js build prerequisite was fixed. The focused heartbeat tests, formatter, typecheck/lint stages shown before the e2e lane, and build passed locally.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.

Compatibility / Migration

  • Backward compatible? (Yes/No) Yes
  • Config/env changes? (Yes/No) No
  • Migration needed? (Yes/No) No
  • If yes, exact upgrade steps: N/A

Risks and Mitigations

  • Risk: a meaningful SIGTERM could be hidden from the user.
    • Mitigation: this only internalizes the structured Exec failed (..., signal SIGTERM) completion shape; non-SIGTERM failures such as code 1 continue to relay and are covered by tests.
  • Risk: mixed event batches could drop useful output.
    • Mitigation: internal-only completion output is filtered, while any real failure in the same batch remains relayable and covered by tests.

Changed files

  • src/agents/bash-tools.exec-runtime.ts (modified, +2/-0)
  • src/agents/bash-tools.test.ts (modified, +7/-1)
  • src/infra/heartbeat-events-filter.test.ts (modified, +54/-0)
  • src/infra/heartbeat-events-filter.ts (modified, +40/-9)
  • src/infra/heartbeat-runner.returns-default-unset.test.ts (modified, +231/-0)
  • src/infra/heartbeat-runner.ts (modified, +64/-22)
  • src/infra/system-events.ts (modified, +4/-0)
RAW_BUFFERClick to expand / collapse

Problem

Benign async exec completions can still surface as user-visible chat messages when the command was terminated during normal cleanup, such as a gateway/session restart.

Observed user-facing example after a manual gateway restart:

  • Telegram DM received: "The async command failed because it was terminated with SIGTERM."
  • The attached output was only OpenClaw CLI help text.
  • The command had been stopped during restart cleanup, not because the user's workflow failed.

Expected

Structured async exec completions that are routine/non-actionable should be handled internally by heartbeat and not relayed into Telegram/WhatsApp/chat:

  • Exec completed (..., code 0) :: ...
  • Exec failed (..., signal SIGTERM) :: ... when it represents cleanup/restart termination

Real failures should still be relayed:

  • Exec failed (..., code 1) :: build failed
  • Other non-SIGTERM failures that require user attention

Actual

Heartbeat sees all exec completion events as user-relayable. Even when the assistant would otherwise return HEARTBEAT_OK, exec completion handling prevents normal ack skipping, so benign completion/cleanup details can become visible chat noise.

Related context

  • Related broad design issue: #69492
  • Related stale/broad PR: #67273
  • Related HEARTBEAT_OK leak issue/PR: #72217 / #72218
  • Related missing async payload fix: #71213

This issue is intentionally narrower than the broad event-audience redesign. It covers the concrete restart-cleanup/SIGTERM noise path while preserving relay behavior for real command failures.

extent analysis

TL;DR

Filter out async exec completions with SIGTERM signals to prevent benign cleanup messages from surfacing as user-visible chat messages.

Guidance

  • Identify and handle Exec failed (..., signal SIGTERM) messages differently based on their origin, distinguishing between cleanup/restart terminations and actual failures.
  • Modify the heartbeat event handling to skip relaying Exec completed and Exec failed (..., signal SIGTERM) messages when they represent routine cleanup or restarts.
  • Preserve the current relay behavior for non-SIGTERM failures and other error conditions that require user attention.
  • Review related issues (#69492, #67273, #72217, #72218, #71213) for broader design and implementation context.

Example

No specific code example can be provided without more context, but the solution likely involves conditional logic to filter out SIGTERM signals during cleanup.

Notes

This fix focuses on the specific path of restart-cleanup/SIGTERM noise and does not address the broader event-audience redesign discussed in related issues.

Recommendation

Apply a workaround to filter out SIGTERM signals during cleanup to prevent unnecessary user-visible chat messages, as a more comprehensive solution may require addressing the broader design issues mentioned in related context.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING