openclaw - 💡(How to fix) Fix Feishu DM sessions not recovering after gateway restart (timing issue - session store loaded after lock cleanup)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

When the OpenClaw gateway restarts while an agent is actively processing a task (e.g., during a model call or tool execution), the interrupted session is not recovered on the Feishu channel. The user must send a new message to reactivate the session.

However, this works correctly on the Telegram channel — interrupted sessions are automatically recovered after restart.

Root Cause

After investigating the behavior and the source code, I identified two potential issues:

Fix Action

Workaround

Currently, users must send a new message to Feishu after a gateway restart to reactivate the session. The previous message's context is preserved in the session transcript, so the agent can continue the conversation naturally.

Code Example

Gateway starts
  ├── HTTP server listens
  ├── Gateway startup log
  ├── Post-ready maintenance (lock cleanup + markRestartAbortedMainSessionsFromLocks)
  ├── 5s delay ──→ scheduleRestartAbortedMainSessionRecovery
  └── Feishu WebSocket connects

---

14:26:34Gateway HTTP listening
14:26:35Channels + sidecars starting  
14:26:40Recovery sidecar fires (5s after startup)
14:37:04Feishu WebSocket client finally connected (~11 min later)

---

{
  "agent:main:feishu:direct:ou_63c1a75f0ed65b60facae9fa9db7c73a": {
    "status": "running",
    "abortedLastRun": false,
    "updatedAt": 1779000083850
  }
}

---

/agents/main/sessions/b8061c94-7586-4002-ab01-467dcd17e433.jsonl.lock
RAW_BUFFERClick to expand / collapse

Issue: Feishu DM sessions not recovering after gateway restart

Summary

When the OpenClaw gateway restarts while an agent is actively processing a task (e.g., during a model call or tool execution), the interrupted session is not recovered on the Feishu channel. The user must send a new message to reactivate the session.

However, this works correctly on the Telegram channel — interrupted sessions are automatically recovered after restart.

Environment

  • OpenClaw version: 2026.5.12
  • Platform: Linux x64, Node.js v22.22.2
  • Channels affected: Feishu (WebSocket mode)
  • Channels not affected: Telegram (polling mode)
  • Model: primary ollama-lan/qwen3.5:9b, agents use bailian/deepseek-v4-flash etc.

Expected Behavior

After a gateway restart, any main session that was in status: "running" at the time of restart should be automatically recovered — the agent should read the session transcript, receive a system message like "Your previous turn was interrupted by a gateway restart...", and continue processing the interrupted task.

Actual Behavior

The interrupted Feishu session is never recovered. The agent does not continue, and no response is ever delivered to the user. The session remains in status: "running" in sessions.json with a stale .jsonl.lock file on disk, but the recovery mechanism either does not trigger or cannot deliver the response.

Root Cause Analysis

After investigating the behavior and the source code, I identified two potential issues:

Issue 1: Post-restart lock cleanup and session marking timing

The gateway startup sequence (simplified):

Gateway starts
  ├── HTTP server listens
  ├── Gateway startup log
  ├── Post-ready maintenance (lock cleanup + markRestartAbortedMainSessionsFromLocks)
  ├── 5s delay ──→ scheduleRestartAbortedMainSessionRecovery
  └── Feishu WebSocket connects

The markRestartAbortedMainSessionsFromLocks function scans for stale .jsonl.lock files and marks corresponding sessions as abortedLastRun: true in the in-memory session store. But it appears the session store (sessions.json) might not be loaded into memory yet when the cleanup runs, so the stale locks are cleaned up but no sessions actually get marked as abortedLastRun. The subsequent scheduleRestartAbortedMainSessionRecovery finds no sessions to recover.

For comparison, Telegram's polling mechanism makes recovery less critical — after restart, getUpdates with the stored offset re-fetches any unacknowledged updates, so the agent simply starts processing the same message again. Feishu's WebSocket push has no equivalent re-delivery mechanism — events are delivered exactly once.

Issue 2: Feishu WebSocket not ready when recovery fires

Even if the recovery mechanism correctly marks and resumes the session, the recovered response needs to be delivered through the Feishu channel. The startup timing shows:

14:26:34 — Gateway HTTP listening
14:26:35 — Channels + sidecars starting  
14:26:40 — Recovery sidecar fires (5s after startup)
14:37:04 — Feishu WebSocket client finally connected (~11 min later)

The recovery completes before the Feishu WebSocket is ready, so any response generated by the recovery cannot be delivered. On subsequent startup attempts, Feishu WebSocket connected within 0.1s, suggesting this delay was due to channel restart/config reload during investigation, but the core timing mismatch remains: the 5-second hardcoded recovery delay (DEFAULT_RECOVERY_DELAY_MS = 5e3) may be insufficient for Feishu channel readiness in general.

Session State Evidence

After a clean restart, the stale session data persists:

{
  "agent:main:feishu:direct:ou_63c1a75f0ed65b60facae9fa9db7c73a": {
    "status": "running",
    "abortedLastRun": false,
    "updatedAt": 1779000083850
  }
}

And the lock file remains on disk:

/agents/main/sessions/b8061c94-7586-4002-ab01-467dcd17e433.jsonl.lock

The session transcript (b8061c94-7586-4002-ab01-467dcd17e433.jsonl, 1.3MB) is intact and contains the full conversation context. The recovery mechanism should be able to read it and continue.

Key Code References

  1. Main session restart recovery: dist/main-session-restart-recovery-D1yxkDUR.js

    • DEFAULT_RECOVERY_DELAY_MS = 5e3 (hardcoded 5s delay)
    • markRestartAbortedMainSessionsFromLocks() — marks sessions from stale locks
    • scheduleRestartAbortedMainSessionRecovery() — resumes sessions after delay
  2. Gateway startup post-attach: dist/server-startup-post-attach-Cd490zZC.js

    • Post-ready maintenance → lock cleanup → mark sessions
    • Sidecars: sidecars.main-session-recovery
  3. Active-memory: dist/extensions/active-memory/index.js

    • Uses runEmbeddedPiAgent with bootstrapContextMode: "lightweight" for sub-agents
  4. Session store: dist/store-3qAZ3Zl6.js

    • Persists to /agents/*/sessions/sessions.json

Suggested Fix

  1. Ensure session store is loaded from disk before lock cleanup: The markRestartAbortedMainSessionsFromLocks function needs the in-memory session store to be populated from sessions.json before it can mark sessions as abortedLastRun. If loading is async, await it.

  2. Make recovery delay configurable or dependent on channel readiness: The 5-second hardcoded delay is fragile for channels with longer startup times (Feishu WebSocket). A configurable recoveryDelayMs or a channel-readiness check before attempting delivery would be more robust.

  3. Consider adding Feishu event persistence: Unlike Telegram's polling (which naturally re-delivers events via getUpdates offset), Feishu's WebSocket push has no retry mechanism. Persisting inbound Feishu events to disk before processing would allow re-delivery after restart.

Workaround

Currently, users must send a new message to Feishu after a gateway restart to reactivate the session. The previous message's context is preserved in the session transcript, so the agent can continue the conversation naturally.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Feishu DM sessions not recovering after gateway restart (timing issue - session store loaded after lock cleanup)