openclaw - 💡(How to fix) Fix [Bug]: gateway model-run sessions accumulate until session maxEntries cap

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

After #82861, openclaw infer model run --gateway correctly uses isolated explicit sessions such as agent:main:explicit:model-run-<uuid> instead of the default agent lane. However, those one-shot model probe sessions are persisted and retained like normal conversation sessions.

On a long-lived install this caused sessions.json to fill with hundreds of agent:main:explicit:model-run-* rows until the default session.maintenance.maxEntries=500 cap was effectively saturated. Native openclaw sessions cleanup --enforce only capped the store back to 500, leaving almost no healthy headroom because the model-run rows were still younger than the default pruneAfter=30d.

This looks like a lifecycle gap for ephemeral model-run sessions, not a failure of the #82861 isolation fix itself.

Root Cause

  • The current behavior converts one-shot provider/model probes into long-lived session-store entries.
  • The default global cap can be consumed almost entirely by probe sessions.
  • Operators have to write ad hoc direct sessions.json scripts to remove stale model-run rows, which is risky and not discoverable.
  • This is adjacent to but distinct from orphan transcript cleanup (#77941), per-label retention (#76827), and cleanup cap/stale enforcement (#83124).

Fix Action

Workaround

Backup the session store, then remove only old agent:main:explicit:model-run-* entries by TTL. This is effective but should not be the long-term operator workflow.

Code Example

openclaw status
Sessions: 549 active
Tasks: 0 active · 0 queued · 0 running

---

{
  "total": 549,
  "buckets": {
    "other": 13,
    "group": 8,
    "cron": 21,
    "model-run": 507
  },
  "older7d": 70,
  "older1d": 513,
  "active4h": 11
}

---

{
  "beforeCount": 549,
  "afterCount": 500,
  "missing": 0,
  "dmScopeRetired": 0,
  "pruned": 0,
  "capped": 49,
  "applied": true,
  "appliedCount": 500
}

---

{
  "total": 500,
  "buckets": {
    "other": 9,
    "group": 8,
    "cron": 20,
    "model-run": 463
  },
  "older7d": 21,
  "older1d": 464,
  "active4h": 10
}

---

openclaw status
Sessions: 61 active

---

{
  "total": 61,
  "buckets": {
    "other": 9,
    "group": 8,
    "cron": 20,
    "model-run": 24
  },
  "older7d": 1,
  "older1d": 25,
  "active4h": 10
}

---

{
  "ok": true,
  "eventLoop": {
    "degraded": false,
    "reasons": []
  },
  "sessionCount": 61
}
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect persisted state / maintenance gap)

Beta release blocker

No

Summary

After #82861, openclaw infer model run --gateway correctly uses isolated explicit sessions such as agent:main:explicit:model-run-<uuid> instead of the default agent lane. However, those one-shot model probe sessions are persisted and retained like normal conversation sessions.

On a long-lived install this caused sessions.json to fill with hundreds of agent:main:explicit:model-run-* rows until the default session.maintenance.maxEntries=500 cap was effectively saturated. Native openclaw sessions cleanup --enforce only capped the store back to 500, leaving almost no healthy headroom because the model-run rows were still younger than the default pruneAfter=30d.

This looks like a lifecycle gap for ephemeral model-run sessions, not a failure of the #82861 isolation fix itself.

Environment

  • OpenClaw: 2026.5.26 (installed stable)
  • npm latest observed locally: 2026.5.28
  • OS: macOS 15.7.4 arm64
  • Node: 25.8.1
  • Gateway: LaunchAgent local gateway
  • Affected session keys: agent:main:explicit:model-run-*

Evidence from production install

Before cleanup:

openclaw status
Sessions: 549 active
Tasks: 0 active · 0 queued · 0 running

Session store composition, counted from ~/.openclaw/agents/main/sessions/sessions.json:

{
  "total": 549,
  "buckets": {
    "other": 13,
    "group": 8,
    "cron": 21,
    "model-run": 507
  },
  "older7d": 70,
  "older1d": 513,
  "active4h": 11
}

Native cleanup result:

{
  "beforeCount": 549,
  "afterCount": 500,
  "missing": 0,
  "dmScopeRetired": 0,
  "pruned": 0,
  "capped": 49,
  "applied": true,
  "appliedCount": 500
}

After native cleanup, the store was still dominated by model-run rows:

{
  "total": 500,
  "buckets": {
    "other": 9,
    "group": 8,
    "cron": 20,
    "model-run": 463
  },
  "older7d": 21,
  "older1d": 464,
  "active4h": 10
}

Manual TTL cleanup of only stale model-run entries restored healthy headroom:

  • Removed: 439 rows matching agent:main:explicit:model-run-* with updatedAt > 24h old
  • Kept: 24 recent model-run rows
  • Preserved: group/direct/cron sessions

After manual model-run TTL cleanup:

openclaw status
Sessions: 61 active
{
  "total": 61,
  "buckets": {
    "other": 9,
    "group": 8,
    "cron": 20,
    "model-run": 24
  },
  "older7d": 1,
  "older1d": 25,
  "active4h": 10
}

Gateway health after cleanup:

{
  "ok": true,
  "eventLoop": {
    "degraded": false,
    "reasons": []
  },
  "sessionCount": 61
}

Expected behavior

One-shot gateway model-run sessions should not accumulate like durable human conversation sessions.

Possible acceptable designs:

  1. Do not persist modelRun: true sessions in the main conversation session store, or persist them only as lightweight probe history elsewhere.
  2. Give agent:*:explicit:model-run-* sessions a short default TTL, e.g. 24h or 48h.
  3. Add a session maintenance policy for model-run/probe sessions, e.g. session.maintenance.modelRunPruneAfter or perKindRetention.
  4. Extend openclaw sessions cleanup with a safe audited option to prune stale model-run sessions by prefix/kind.

Actual behavior

model-run-<uuid> rows are retained under the same global pruneAfter=30d and maxEntries=500 policy as durable conversation sessions.

When there are many model probes, the global cap becomes dominated by probe sessions. Native cleanup caps overflow but still leaves the store near 500, so the install can quickly hit the same pressure again.

Why this matters

  • The current behavior converts one-shot provider/model probes into long-lived session-store entries.
  • The default global cap can be consumed almost entirely by probe sessions.
  • Operators have to write ad hoc direct sessions.json scripts to remove stale model-run rows, which is risky and not discoverable.
  • This is adjacent to but distinct from orphan transcript cleanup (#77941), per-label retention (#76827), and cleanup cap/stale enforcement (#83124).

Related

  • #82861 introduced the explicit model-run-<uuid> session isolation for gateway model runs. This issue is about lifecycle/retention for those isolated one-shot sessions after they are created.
  • #77941 asks for audited orphan/unindexed transcript archive/prune.
  • #76827 asks for per-label retention.
  • #83124 covers a cleanup enforce regression for cap-overflow/prune-stale, but in this production case native cleanup did apply; it just could not create healthy headroom because model-run rows were still within the default age retention window.

Workaround

Backup the session store, then remove only old agent:main:explicit:model-run-* entries by TTL. This is effective but should not be the long-term operator workflow.

Acceptance criteria

  • Repeated openclaw infer model run --gateway invocations do not cause agent:main:explicit:model-run-* rows to dominate sessions.json under default maintenance settings.
  • openclaw sessions cleanup --dry-run --json or another supported command can show and safely prune stale model-run/probe sessions without direct store editing.
  • Durable direct/group/cron sessions remain protected by the normal retention rules.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

One-shot gateway model-run sessions should not accumulate like durable human conversation sessions.

Possible acceptable designs:

  1. Do not persist modelRun: true sessions in the main conversation session store, or persist them only as lightweight probe history elsewhere.
  2. Give agent:*:explicit:model-run-* sessions a short default TTL, e.g. 24h or 48h.
  3. Add a session maintenance policy for model-run/probe sessions, e.g. session.maintenance.modelRunPruneAfter or perKindRetention.
  4. Extend openclaw sessions cleanup with a safe audited option to prune stale model-run sessions by prefix/kind.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: gateway model-run sessions accumulate until session maxEntries cap