openclaw - ✅(Solved) Fix [Bug]: writePhaseSignalStore / writeStore do not clean up orphaned temporary files [4 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#77888Fetched 2026-05-06 06:19:42
View on GitHub
Comments
1
Participants
2
Timeline
5
Reactions
2
Timeline (top)
cross-referenced ×4commented ×1

writePhaseSignalStore and writeStore use a write-then-rename pattern to atomically update on-disk stores. If the process crashes between writeFile and rename, or if rename fails, a .tmp file is left orphaned in the short-term artifacts directory. There is no cleanup mechanism for these orphaned files.

Root Cause

writePhaseSignalStore and writeStore use a write-then-rename pattern to atomically update on-disk stores. If the process crashes between writeFile and rename, or if rename fails, a .tmp file is left orphaned in the short-term artifacts directory. There is no cleanup mechanism for these orphaned files.

Fix Action

Fixed

PR fix notes

PR #77890: fix(memory-core): clean up orphaned temp files from writePhaseSignalStore and writeStore

Description (problem / solution / changelog)

Summary

writePhaseSignalStore and writeStore use write-then-rename for atomic updates but never clean up the temporary file. If rename fails, the .tmp file is left orphaned.

  • Problem: On rename failure (cross-filesystem, permission, etc.) the .tmp file persists in the short-term artifacts directory with no cleanup.
  • Why it matters: Over long-running deployments, orphaned .tmp files accumulate as litter in the workspace directory.
  • What changed: Wrapped writeFile + rename in try...finally that runs fs.unlink(tmpPath).catch(() => {}). If rename succeeded, unlink is a no-op (file already moved). If rename failed, the tmp file is cleaned up.
  • What did NOT change: The write-then-rename atomic pattern, the file naming, any caller behavior.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #77888

Root Cause

  • Root cause: write-then-rename pattern had no cleanup path for the temporary file. If rename failed, the .tmp file was left behind with no mechanism to remove it.
  • Missing detection / guardrail: No cleanup of stale .tmp files existed in repair or sweep paths.

Regression Test Plan

  • Coverage level that should have caught this: Unit test
  • Target test or file: extensions/memory-core/src/short-term-promotion.test.ts
  • Scenario the test should lock in: Mock fs.rename to throw, verify .tmp file is unlinked in finally.
  • Existing test that already covers this: None
  • If no new test is added, why not: The try...finally pattern is structurally self-evident — the finally block unconditionally cleans up the tmp path. All 44 existing tests pass.

User-visible / Behavior Changes

None.

Security Impact

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No

Repro + Verification

Evidence: All 44 existing tests pass after the change.

Real behavior proof (required for external PRs)

  • Behavior or issue addressed: writePhaseSignalStore and writeStore now clean up their temporary files via try...finally after write-then-rename.
  • Real environment tested: macOS 15.x, Node 22, OpenClaw main branch with the fix applied.
  • Exact steps or command run after this patch: pnpm test extensions/memory-core/src/short-term-promotion.test.ts --run
  • Evidence after fix: All 44 existing short-term promotion tests pass: Test Files 1 passed (1), Tests 44 passed (44), Duration 648ms
  • Observed result after fix: write-then-rename behavior unchanged for success path. On rename failure, the finally block unconditionally cleans up the temporary file.
  • What was not tested: Simulating a real rename failure. The try...finally pattern is structurally self-evident.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No

Risks and Mitigations

  • Risk: unlink in finally on a successfully-renamed file is a no-op (ENOENT caught by .catch()).
    • Mitigation: .catch(() => {}) suppresses the expected ENOENT.

Changed files

  • extensions/memory-core/src/short-term-promotion.ts (modified, +12/-4)

PR #77894: fix(memory): preserve phase signal store on read errors

Description (problem / solution / changelog)

Summary

  • Problem: readPhaseSignalStore treated non-ENOENT read failures as an empty store, so the next dreaming phase write could erase accumulated phase-signal history.
  • Why it matters: Light/REM reinforcement counts are durable ranking signals in memory/.dreams/phase-signals.json; losing them silently weakens deep promotion behavior after transient filesystem failures.
  • What changed: non-ENOENT/non-SyntaxError phase-signal read errors now throw, atomic recall/phase-signal writes clean up their current temp file if rename fails, and repair removes stale recall/phase-signal temp files older than one hour.
  • What did NOT change (scope boundary): missing phase-signal files and corrupt phase-signal JSON still recover to an empty store.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #77881
  • Closes #77888
  • This PR fixes a bug or regression

Real behavior proof (required for external PRs)

  • Behavior or issue addressed: phase-signal store I/O failures no longer get converted into an empty store, and stale short-term recall/phase-signal temp files are cleaned during repair.
  • Real environment tested: local macOS OpenClaw checkout on 2026-05-05, Node via pnpm exec tsx, real filesystem-backed temporary memory workspace with memory/.dreams stores.
  • Exact steps or command run after this patch: pnpm exec tsx ran production memory-core APIs: recordShortTermRecalls -> recordDreamingPhaseSignals(read error) -> repairShortTermPromotionArtifacts.
  • Evidence after fix: terminal output from the real OpenClaw memory storage path:
REAL_OPENCLAW_MEMORY_PROOF {"workspaceBasename":"openclaw-real-memory-proof-x2rsNY","command":"recordShortTermRecalls -> recordDreamingPhaseSignals(read error) -> repairShortTermPromotionArtifacts","readErrorCode":"EACCES","phaseStorePreservedAfterReadError":true,"repairChanged":true,"removedTempFiles":2,"staleRecallTmpRemoved":true,"stalePhaseTmpRemoved":true,"finalPhaseSignalKeys":["memory:memory/2026-05-05.md:3:3"]}
  • Observed result after fix: the EACCES read failure surfaced without overwriting phase-signals.json; repair removed both stale short-term tmp files and left the live phase-signal key intact.
  • What was not tested: a long-running Gateway scheduled dreaming sweep and external model/provider calls; this proof exercised the same production storage helpers used by dreaming.
  • Before evidence: source inspection matched the issues: non-ENOENT phase-signal read errors previously fell back to emptyPhaseSignalStore(nowIso), and repair had no stale temp-file cleanup path.

Root Cause (if applicable)

  • Root cause: readPhaseSignalStore used the same empty-store fallback for operational read failures as it used for first-run missing files and corrupt JSON recovery.
  • Missing detection / guardrail: there was no regression test for non-ENOENT phase-signal read failures, and repair only normalized the main recall store plus stale locks.
  • Contributing context (if known): recall and phase-signal stores used duplicate write-then-rename code without a shared cleanup path for failed renames.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: extensions/memory-core/src/short-term-promotion.test.ts
  • Scenario the test should lock in: non-ENOENT phase-signal reads reject without overwriting the existing store; repair removes stale short-term store temp files only.
  • Why this is the smallest reliable guardrail: it runs the exported memory-core storage APIs against real temp files without needing a full Gateway dreaming schedule.
  • Existing test that already covers this (if any): none for these failure paths.
  • If no new test is added, why not: N/A

User-visible / Behavior Changes

Dreaming repair can now remove stale short-term recall/phase-signal temp files and reports the count in the repair summary. Operational phase-signal read failures are surfaced to the existing caller error handling instead of silently truncating durable reinforcement state.

Diagram (if applicable)

Before:
phase-signals read EACCES -> empty store -> write replacement -> old signals lost
repair -> ignores stale *.tmp files

After:
phase-signals read EACCES -> throw -> caller failure handling -> old signals preserved
repair -> remove stale recall/phase-signal *.tmp files -> report count

Security Impact (required)

  • New permissions/capabilities? (Yes/No) No
  • Secrets/tokens handling changed? (Yes/No) No
  • New/changed network calls? (Yes/No) No
  • Command/tool execution surface changed? (Yes/No) No
  • Data access scope changed? (Yes/No) No
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: macOS local checkout
  • Runtime/container: Node 22+/pnpm repo scripts
  • Model/provider: N/A
  • Integration/channel (if any): memory-core bundled plugin
  • Relevant config (redacted): N/A

Steps

  1. Seed short-term recall and phase-signal stores in a temporary workspace.
  2. Mock the phase-signal store read path to throw an EACCES-style error.
  3. Call recordDreamingPhaseSignals and verify the original phase-signal file contents remain unchanged.
  4. Seed stale and fresh .tmp files in memory/.dreams/, run repairShortTermPromotionArtifacts, and verify only stale recall/phase-signal temp files are removed.

Expected

  • Non-ENOENT phase-signal read errors are not treated as empty stores.
  • Repair removes stale short-term store temp files and preserves fresh/unrelated temp files.

Actual

  • Matches expected after this patch.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Verification run:

pnpm test extensions/memory-core/src/short-term-promotion.test.ts extensions/memory-core/src/cli.test.ts
Test Files  2 passed (2)
Tests       97 passed (97)

Additional checks:

pnpm exec oxfmt --check --threads=1 CHANGELOG.md extensions/memory-core/src/short-term-promotion.ts extensions/memory-core/src/short-term-promotion.test.ts extensions/memory-core/src/cli.runtime.ts
All matched files use the correct format.

git diff --check
passed

Known unrelated local typecheck blocker:

pnpm tsgo:extensions
pnpm tsgo:extensions:test

Both currently stop on existing Feishu callback type errors and web-tree-sitter type export errors outside this PR's touched files.

Human Verification (required)

  • Verified scenarios: phase-signal EACCES-style read failure preserves existing store bytes; stale recall/phase-signal temp files are removed by repair; fresh and unrelated temp files are preserved; CLI repair tests still pass.
  • Edge cases checked: ENOENT remains recoverable by existing behavior; corrupt JSON remains recoverable by existing behavior; unrelated .tmp files are ignored.
  • What you did not verify: a live scheduled Gateway dreaming sweep.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? (Yes/No) Yes
  • Config/env changes? (Yes/No) No
  • Migration needed? (Yes/No) No
  • If yes, exact upgrade steps: N/A

Risks and Mitigations

  • Risk: surfacing operational phase-signal read errors may mark a dreaming workspace sweep as failed instead of continuing with reduced signals.
    • Mitigation: that is intentional for non-ENOENT I/O failures; missing files and corrupt JSON still use the previous recovery behavior.

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • extensions/memory-core/src/cli.runtime.ts (modified, +3/-0)
  • extensions/memory-core/src/short-term-promotion.test.ts (modified, +104/-0)
  • extensions/memory-core/src/short-term-promotion.ts (modified, +71/-8)

PR #77895: fix(memory-core): clean stale short-term temp files

Description (problem / solution / changelog)

Summary

  • clean stale orphaned short-term-recall.json.*.tmp and phase-signals.json.*.tmp files during short-term promotion repair
  • preserve fresh temp files and unrelated temp-looking files so active or manual files are not swept
  • surface removedTempFiles through memory-core repair summaries, doctor output, and the plugin SDK facade

Fixes #77888.

Root cause

writeStore and writePhaseSignalStore write a sibling temp file and then rename it into place. A process crash or failed rename after the write can leave the temp file in memory/.dreams indefinitely because repair only normalized the recall store and stale lock.

Real behavior proof

  • Behavior or issue addressed: stale orphaned memory-core short-term recall temp files are removed by repair, while fresh temp files and unrelated temp-looking files are preserved.
  • Real environment tested: local OpenClaw checkout on macOS with Node running the patched memory-core repair function against a temporary OpenClaw workspace under /var/folders/.../T.
  • Exact steps or command run after this patch: ran node --import tsx from the patched checkout, created a temporary workspace, wrote an old matching short-term-recall.json.*.tmp, a fresh matching temp file, and an unrelated .manual.tmp, then called repairShortTermPromotionArtifacts({ workspaceDir }).
  • Evidence after fix: copied live terminal output from the command:
{
  "workspaceDir": "/var/folders/cf/mtzn4vhx6kz_83fmvt5qs60m0000gn/T/openclaw-77888-proof-tf7V6a",
  "changed": true,
  "removedTempFiles": 1,
  "rewroteStore": false,
  "removedStaleLock": false,
  "oldTmpExists": false,
  "freshTmpExists": true,
  "unrelatedTmpExists": true
}
  • Observed result after fix: the stale matching temp file no longer existed, the fresh matching temp file still existed, and the unrelated temp-looking file still existed; repair reported changed: true and removedTempFiles: 1 without rewriting the store or removing a lock.
  • What was not tested: I did not force-kill a live dreaming process mid-rename; the live proof exercised the repair cleanup path directly with the same temp-file naming pattern left by interrupted atomic writes.

Validation

  • pnpm test extensions/memory-core/src/short-term-promotion.test.ts
  • pnpm test src/commands/doctor-memory-search.test.ts
  • pnpm exec oxfmt --check --threads=1 CHANGELOG.md extensions/memory-core/src/short-term-promotion.ts extensions/memory-core/src/short-term-promotion.test.ts extensions/memory-core/src/dreaming.ts extensions/memory-core/src/cli.runtime.ts src/commands/doctor-memory-search.ts src/commands/doctor-memory-search.test.ts src/plugin-sdk/memory-core-engine-runtime.ts
  • pnpm check:changed
  • git diff --check

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • docs/.generated/plugin-sdk-api-baseline.sha256 (modified, +2/-2)
  • extensions/memory-core/src/cli.runtime.ts (modified, +5/-0)
  • extensions/memory-core/src/dreaming.ts (modified, +6/-0)
  • extensions/memory-core/src/short-term-promotion.test.ts (modified, +40/-0)
  • extensions/memory-core/src/short-term-promotion.ts (modified, +61/-1)
  • src/commands/doctor-memory-search.test.ts (modified, +2/-0)
  • src/commands/doctor-memory-search.ts (modified, +3/-0)
  • src/plugin-sdk/memory-core-engine-runtime.ts (modified, +1/-0)

PR #78037: fix(memory-core): preserve dreaming phase-signal history on transient I/O errors (#77881)

Description (problem / solution / changelog)

Summary

Fixes #77881. readPhaseSignalStore in extensions/memory-core/src/short-term-promotion.ts previously caught every error and returned an empty store. The caller (recordDreamingPhaseSignals, line ~1151) then mutates that empty store with the current cycle's hits and writes it back via writePhaseSignalStore — silently destroying every previously accumulated lightHits/remHits count on a transient I/O failure (EACCES, EIO, EBUSY, EMFILE, ENOSPC, antivirus lock, etc.).

This patch classifies errors at the read boundary:

Error classNew behaviorRationale
ENOENTreturn empty storefirst run, nothing to lose
SyntaxErrorwarn + return empty storecorrupt JSON; on-disk damage already permanent, recovery is the only forward path
any other I/O errorre-throwcaller's try { … } catch { failedWorkspaces++ } in dreaming.ts:551–665 handles it and skips the destructive writePhaseSignalStore

This restores symmetry with the sibling readStore function in the same file (which already throws on non-ENOENT). The bug was a one-line regression: a fall-through return emptyPhaseSignalStore(nowIso) after the conditional return path.

Behavior contract (mathematical guarantee)

Let Φ be the on-disk phase-signal store and Φ' the result of one recordDreamingPhaseSignals cycle. For any read error e thrown inside the lock body:

  • e.code = "ENOENT"Φ' = update(∅, current_cycle) and previous Φ did not exist (no loss).
  • e instanceof SyntaxErrorΦ' = update(∅, current_cycle) and Φ was already unrecoverable JSON (loss already on-disk; bounded).
  • otherwise ⇒ Φ' = Φ (untouched on disk), error propagates, failedWorkspaces++.

The previous behavior was: for every e ≠ ENOENT and ≠ SyntaxError, Φ' = update(∅, current_cycle) even though Φ was healthy — destroying all prior lightHits/remHits history that the dreaming pipeline depends on for promotion gating.

Real behavior proof

3 new tests in extensions/memory-core/src/short-term-promotion.test.ts:

  1. ENOENT path — no prior file ⇒ recordDreamingPhaseSignals resolves, file written with lightHits: 1. Locks in the existing-correct branch.
  2. SyntaxError path — pre-write garbage JSON ⇒ resolves with a console.warn mentioning phase-signal store … corrupt JSON. Validates graceful corrupt-JSON recovery.
  3. EISDIR path (the regression site) — pre-populate phase signals via the public API, replace the JSON file with a directory of the same name (reliably reproducible cross-platform; no chmod), call recordDreamingPhaseSignals ⇒ rejects with code: "EISDIR", and phase-signals.json is still a directory on disk. The destructive empty-write did not occur.
$ pnpm test extensions/memory-core/src/short-term-promotion.test.ts
Test Files  1 passed (1)
     Tests  47 passed (47)

The 3 new tests are scoped to describe("readPhaseSignalStore error classification (issue #77881)") so reviewers can locate them quickly.

Risk

  • Surface: one private function (readPhaseSignalStore) with three call sites all inside withShortTermLock — the lock is released cleanly via the helper's existing finally (line 726), so re-throwing does not leak locks.
  • Semantics: ENOENT and SyntaxError paths are unchanged. Only the previously-silent error class becomes loud.
  • Caller compatibility: dreaming.ts:551–665 already wraps runDreamingSweepPhases (and downstream phase-signal writes) in a per-workspace try/catch that increments failedWorkspaces. No new caller plumbing needed.

Why this matters

The dreaming pipeline's promotion gates (PHASE_SIGNAL_LIGHT_BOOST_MAX, PHASE_SIGNAL_REM_BOOST_MAX, the consolidation component) all depend on accumulated phase-signal history. A single antivirus lock or transient EIO during a cron-driven dream cycle was enough to wipe weeks of consolidation signal silently — the user would only notice that promotions stopped happening, with no error in the logs and no entry in the failed= count. After this patch, the failure is loud, accounted for, and non-destructive.

Closes #77881

🤖 Generated with Claude Code

Changed files

  • extensions/memory-core/src/short-term-promotion.test.ts (modified, +159/-0)
  • extensions/memory-core/src/short-term-promotion.ts (modified, +19/-2)
RAW_BUFFERClick to expand / collapse

Summary

writePhaseSignalStore and writeStore use a write-then-rename pattern to atomically update on-disk stores. If the process crashes between writeFile and rename, or if rename fails, a .tmp file is left orphaned in the short-term artifacts directory. There is no cleanup mechanism for these orphaned files.

Steps to reproduce

  1. Enable dreaming in memory-core
  2. Force-kill the process during a dreaming cycle (between writeFile and rename)
  3. Check the artifacts directory for leftover *.tmp files

Expected behavior

Orphaned .tmp files older than a reasonable threshold (e.g. 1 hour) should be cleaned up during repairShortTermPromotionArtifacts or at dreaming sweep start.

Actual behavior

Orphaned .tmp files accumulate indefinitely. Each file is a few KB (serialized store JSON), but over long-running deployments with occasional crashes, this becomes litter in the workspace directory.

OpenClaw version

main branch

Operating system

N/A (cross-platform filesystem issue)

Model

N/A

Provider / routing chain

N/A

extent analysis

TL;DR

Implement a cleanup mechanism to remove orphaned .tmp files older than a reasonable threshold during repairShortTermPromotionArtifacts or at dreaming sweep start.

Guidance

  • Identify the threshold for considering a .tmp file as orphaned (e.g., 1 hour) and implement a timer or scheduler to periodically clean up these files.
  • Modify the repairShortTermPromotionArtifacts function to include a cleanup step for orphaned .tmp files.
  • Consider adding a check at the start of the dreaming cycle to remove any existing orphaned .tmp files.
  • Ensure the cleanup mechanism is idempotent and safe to run concurrently with other processes.

Example

import os
import time

def cleanup_orphaned_tmp_files(directory, threshold):
    for filename in os.listdir(directory):
        if filename.endswith('.tmp'):
            filepath = os.path.join(directory, filename)
            if time.time() - os.path.getctime(filepath) > threshold:
                os.remove(filepath)

Notes

The provided solution assumes a simple threshold-based approach for cleaning up orphaned files. Depending on the specific requirements, a more sophisticated mechanism might be needed, such as tracking the creation time of .tmp files or using a transactional approach to ensure atomicity.

Recommendation

Apply a workaround by implementing a periodic cleanup mechanism, as the issue is not version-specific and a fixed version is not mentioned. This approach will help mitigate the accumulation of orphaned .tmp files until a more permanent solution is implemented.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Orphaned .tmp files older than a reasonable threshold (e.g. 1 hour) should be cleaned up during repairShortTermPromotionArtifacts or at dreaming sweep start.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix [Bug]: writePhaseSignalStore / writeStore do not clean up orphaned temporary files [4 pull requests, 1 comments, 2 participants]