openclaw - 💡(How to fix) Fix Defaults cause unbounded transcript growth + nightly session death (coding-agent workloads)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

OpenClaw's out-of-the-box defaults make a long-running coding-agent / ops-assistant deployment unsafe to leave alone: sessions are killed nightly (memory-loss), and transcripts grow without bound until they degrade the runtime. Every operator has to discover the same five hardening knobs the hard way and re-apply them on every host.

We've now applied the same defensive config on three hosts (.221 production with 36+ agents, .237 reinstall, .239 ops). On each host we had to repeat the same five-layer fix. That's a clear sign the defaults are wrong, not our hosts.

Error Message

session.maintenance.mode defaults to warn instead of enforce, so the runtime just logs about old session files but never cleans them up. 3. session.maintenance.mode: default to enforce with a conservative pruneAfter (e.g. 90d) instead of warn. Logging-only that never cleans up is just disk leakage.

  • Writing kind instead of mode makes the gateway silently refuse to apply that block (no clear error message about which field is wrong).

Root Cause

OpenClaw is fantastic for the coding-agent / personal-assistant use case. But that use case is exactly the one the defaults punish. The first weeks with a new deployment look like a series of mystery memory-losses and unbounded disk growth, until someone debugs deeply enough to find the five separate knobs scattered across agents.defaults, session, and an external cron.

Happy to send a PR if there's interest in any of the four default changes above.

Fix Action

Fix / Workaround

While applying the workaround we lost about half an hour because:

Code Example

const DEFAULT_RESET_MODE = "daily";
// ...
const mode = typeReset?.mode
          ?? baseReset?.mode
          ?? (!hasExplicitReset && legacyIdleMinutes != null ? "idle" : "daily");

---

{
  "agents": {
    "defaults": {
      "compaction": {
        "model": "anthropic/claude-sonnet-4-20250514",
        "maxActiveTranscriptBytes": "500kb",
        "truncateAfterCompaction": true,
        "memoryFlush": {
          "forceFlushTranscriptBytes": "500kb"
        }
      },
      "contextPruning": {
        "mode": "cache-ttl",
        "ttl": "5m"
      }
    }
  },
  "session": {
    "maintenance": {
      "mode": "enforce",
      "pruneAfter": "30d",
      "maxDiskBytes": "1gb"
    },
    "resetByType": {
      "direct": { "mode": "idle", "idleMinutes": 10080 }
    }
  }
}

---

*/15 * * * * <user> find <home>/.openclaw/agents/*/sessions/ -maxdepth 1 -name '*.jsonl' \
  -size +1M -mmin +60 \
  -exec sh -c 'mkdir -p "$(dirname "$1")/parked" && \
               mv "$1" "$(dirname "$1")/parked/$(basename "$1").parked-$(date +%Y%m%d-%H%M%S)"' _ {} \;
RAW_BUFFERClick to expand / collapse

Defaults cause unbounded transcript growth + nightly session death (coding-agent workloads)

Summary

OpenClaw's out-of-the-box defaults make a long-running coding-agent / ops-assistant deployment unsafe to leave alone: sessions are killed nightly (memory-loss), and transcripts grow without bound until they degrade the runtime. Every operator has to discover the same five hardening knobs the hard way and re-apply them on every host.

We've now applied the same defensive config on three hosts (.221 production with 36+ agents, .237 reinstall, .239 ops). On each host we had to repeat the same five-layer fix. That's a clear sign the defaults are wrong, not our hosts.

Reproduction (default install)

  1. npm install -g [email protected]
  2. Pair a Telegram / Slack / WhatsApp account, start an agent (coding agent, ops assistant — anything that runs for days).
  3. Use it normally for a week.

Observed:

  • Every night at atHour: 4 local time, the direct-chat session is reset. Long-term context is gone. The agent does not remember yesterday.
  • The active .jsonl transcript for the session keeps growing — tens of MB after a few days of heavy use. Eventually it harms responsiveness and eats provider context budget.
  • ~/.openclaw/agents/*/sessions/ accumulates files indefinitely; nothing prunes them.

Source confirmation

Behavior comes from compiled OC source (verified against installed 2026.5.19, file dist/reset-CMlTzEqB.js):

const DEFAULT_RESET_MODE = "daily";
// ...
const mode = typeReset?.mode
          ?? baseReset?.mode
          ?? (!hasExplicitReset && legacyIdleMinutes != null ? "idle" : "daily");

So if the user does not explicitly configure session.reset / session.resetByType, the resolver falls back to "daily" with the default atHour. For a chat-bot use case that's fine. For a coding agent / ops assistant it is destructive.

Equally, agents.defaults.compaction.maxActiveTranscriptBytes is optional and unset by default, so truncateAfterCompaction never fires and the local transcript grows unbounded.

session.maintenance.mode defaults to warn instead of enforce, so the runtime just logs about old session files but never cleans them up.

What we currently have to apply on every host

This is the minimum to keep a long-running coding-agent deployment healthy:

{
  "agents": {
    "defaults": {
      "compaction": {
        "model": "anthropic/claude-sonnet-4-20250514",
        "maxActiveTranscriptBytes": "500kb",
        "truncateAfterCompaction": true,
        "memoryFlush": {
          "forceFlushTranscriptBytes": "500kb"
        }
      },
      "contextPruning": {
        "mode": "cache-ttl",
        "ttl": "5m"
      }
    }
  },
  "session": {
    "maintenance": {
      "mode": "enforce",
      "pruneAfter": "30d",
      "maxDiskBytes": "1gb"
    },
    "resetByType": {
      "direct": { "mode": "idle", "idleMinutes": 10080 }
    }
  }
}

Plus, because layers 1–4 can still be defeated by very fast trajectory growth, we also ship an external cron watchdog that parks idle-and-large transcripts:

*/15 * * * * <user> find <home>/.openclaw/agents/*/sessions/ -maxdepth 1 -name '*.jsonl' \
  -size +1M -mmin +60 \
  -exec sh -c 'mkdir -p "$(dirname "$1")/parked" && \
               mv "$1" "$(dirname "$1")/parked/$(basename "$1").parked-$(date +%Y%m%d-%H%M%S)"' _ {} \;

Proposed defaults

We're not asking for a behavior change for chat-bot installs. But these defaults should at minimum not silently destroy memory and disk for non-chat-bot installs.

Concrete suggestions:

  1. session.resetByType.direct: ship a default of { mode: "idle", idleMinutes: 10080 } (7 days idle) instead of falling back to daily atHour:4. Daily reset is fine for group and thread, not for direct.
  2. agents.defaults.compaction.maxActiveTranscriptBytes: ship a sensible default (e.g. "5mb" or "10mb") with truncateAfterCompaction: true, so preflight compaction actually kicks in.
  3. session.maintenance.mode: default to enforce with a conservative pruneAfter (e.g. 90d) instead of warn. Logging-only that never cleans up is just disk leakage.
  4. Documentation: the docs at docs/concepts/compaction.md and docs/gateway/config-agents.md describe these knobs accurately, but a section "Hardening for long-running agents" listing all five layers together would have saved us months of repeated debugging.

Schema friction (separate but related)

While applying the workaround we lost about half an hour because:

  • The docs example uses resetByType: { direct: { mode: "idle", idleMinutes: 240 } }.
  • An older / legacy variant in the codebase appears to accept (or used to accept) kind: "idle".
  • Writing kind instead of mode makes the gateway silently refuse to apply that block (no clear error message about which field is wrong).

If kind is officially deprecated, a parse-time warning pointing at the correct field would help. If it's still accepted, the docs should mention both. Right now: silent fail.

Why this matters

OpenClaw is fantastic for the coding-agent / personal-assistant use case. But that use case is exactly the one the defaults punish. The first weeks with a new deployment look like a series of mystery memory-losses and unbounded disk growth, until someone debugs deeply enough to find the five separate knobs scattered across agents.defaults, session, and an external cron.

Happy to send a PR if there's interest in any of the four default changes above.

Environment

  • OpenClaw 2026.5.19 (a185ca2)
  • Ubuntu 24.04 LTS, Node 22.x
  • Three independent hosts, same fix needed on each.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING