openclaw - ✅(Solved) Fix Feishu channel can hot-loop gateway on stale or corrupt channel session state [1 pull requests, 2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#74237Fetched 2026-04-30 06:26:58
View on GitHub
Comments
2
Participants
2
Timeline
7
Reactions
2
Timeline (top)
cross-referenced ×3commented ×2mentioned ×1subscribed ×1

openclaw gateway can enter a sustained high-CPU state when the Feishu channel is enabled and the local Feishu channel/session state contains stale or corrupt data.

In this case the Feishu app credentials were valid and the bundled Feishu plugin itself was present. The gateway became stable only after rebuilding Feishu-local runtime/dedup state and removing the stale Feishu agent session entry/files. Re-enabling Feishu with the same App ID and secret then worked normally.

Root Cause

I did not attach the old session/state files because they may contain private Feishu message metadata, but I can provide sanitized counts, file sizes, or additional log snippets if useful.

Fix Action

Fix / Workaround

Workaround that fixed it

After the workaround, with the same Feishu App ID/secret:

Possible mitigations:

PR fix notes

PR #74397: fix(feishu): repair stale channel state in doctor

Description (problem / solution / changelog)

Summary

  • Problem: Feishu can keep reprocessing stale/corrupt local channel state or direct channel session entries, which can leave the gateway hot-looping after startup.
  • Why it matters: Reinstalling the plugin or re-entering Feishu App ID/secret does not clear these local runtime/session files, so users can remain stuck with high CPU even when credentials are valid.
  • What changed: Added a Feishu-owned doctor sequence that detects corrupt Feishu JSON state and suspicious Feishu direct session transcripts, then openclaw doctor --fix archives/rebuilds only Feishu local state and Feishu-scoped direct session entries.
  • What did NOT change (scope boundary): This does not change Feishu credentials, App ID/secret config, network behavior, message handling, or ACP binding session entries.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #74237
  • Related #74237
  • This PR fixes a bug or regression

Root Cause (if applicable)

  • Root cause: Feishu had persistent local runtime/session state that could become stale or corrupt independently from plugin installation and credentials. The channel did not have a doctor-owned repair path to rebuild that channel-local state while preserving configured App ID/secret values.
  • Missing detection / guardrail: openclaw doctor did not inspect Feishu local state JSON or Feishu direct channel session transcript integrity, so the recovery step was manual and easy to over-delete.
  • Contributing context (if known): A fresh plugin reinstall can still reuse existing ~/.openclaw/feishu files and agent session store entries, so the problem is not solved by reinstalling the Feishu plugin alone.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: extensions/feishu/src/doctor.test.ts
  • Scenario the test should lock in: Feishu doctor stays quiet for healthy state, warns before repair for corrupt Feishu local JSON, and --fix archives Feishu local state plus direct Feishu session entries while preserving non-Feishu sessions and ACP binding session entries.
  • Why this is the smallest reliable guardrail: The bug is in channel-local state repair behavior, so the test can cover the doctor sequence directly without requiring a live Feishu tenant.
  • Existing test that already covers this (if any): None.
  • If no new test is added, why not: N/A.

User-visible / Behavior Changes

openclaw doctor can now warn when Feishu local channel state appears repairable. openclaw doctor --fix can archive/rebuild Feishu local runtime state and Feishu-scoped direct session entries while preserving Feishu App ID/secret config.

Diagram (if applicable)

Before:
[stale Feishu local state] -> [gateway restart] -> [same bad state reused]

After:
[openclaw doctor --fix] -> [archive Feishu state/session entries] -> [gateway rebuilds clean Feishu local state]

Security Impact (required)

  • New permissions/capabilities? (Yes/No) No
  • Secrets/tokens handling changed? (Yes/No) No
  • New/changed network calls? (Yes/No) No
  • Command/tool execution surface changed? (Yes/No) No
  • Data access scope changed? (Yes/No) Yes
  • If any Yes, explain risk + mitigation: Feishu doctor now reads Feishu local state JSON and configured/discovered session stores under the OpenClaw state directory, and repair mode renames/removes only Feishu-scoped direct channel state. It creates timestamped backups and intentionally preserves Feishu App ID/secret config, non-Feishu sessions, and ACP binding session entries.

Repro + Verification

Environment

  • OS: Linux VM observed during debugging; local tests run on macOS
  • Runtime/container: OpenClaw CLI/gateway local mode
  • Model/provider: N/A
  • Integration/channel (if any): Feishu
  • Relevant config (redacted): Feishu channel configured with App ID/secret; local Feishu state/session data present

Steps

  1. Configure Feishu and start the gateway.
  2. Leave stale/corrupt Feishu local state or Feishu direct session entries on disk.
  3. Restart the gateway.
  4. Run openclaw doctor and then openclaw doctor --fix.

Expected

  • Doctor identifies Feishu-local repairable state.
  • Repair archives Feishu-local state and Feishu-scoped direct session entries.
  • Feishu App ID/secret config remains intact.
  • Non-Feishu sessions and ACP binding session entries remain intact.

Actual

  • Before this PR, users had to manually delete Feishu local state/session files to recover.
  • After this PR, doctor provides a scoped repair path.

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Local verification:

node --no-maglev node_modules/vitest/vitest.mjs run --config test/vitest/vitest.extension-feishu.config.ts extensions/feishu/src/doctor.test.ts
Test Files  1 passed (1)
Tests       4 passed (4)
node_modules/.bin/oxfmt --check extensions/feishu/src/doctor.ts extensions/feishu/src/doctor.test.ts extensions/feishu/src/channel.ts
All matched files use the correct format.
node_modules/.bin/oxlint extensions/feishu/src/doctor.ts extensions/feishu/src/doctor.test.ts extensions/feishu/src/channel.ts
Found 0 warnings and 0 errors.

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios: Healthy Feishu state is quiet; corrupt Feishu local JSON warns before repair; repair archives Feishu local state and direct Feishu sessions while preserving App ID/secret config, non-Feishu sessions, and ACP binding session entries.
  • Edge cases checked: Legacy feishu:* session keys, agent:<id>:feishu:* session keys, ACP binding keys containing feishu, missing transcript paths, blank user message transcripts, local JSON parse failures.
  • What you did not verify: Full pnpm check / pnpm test; this local checkout does not currently have pnpm/corepack available, and the broader Feishu channel test import path is missing typebox in node_modules.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.

Compatibility / Migration

  • Backward compatible? (Yes/No) Yes
  • Config/env changes? (Yes/No) No
  • Migration needed? (Yes/No) No
  • If yes, exact upgrade steps: N/A. Users affected by stale Feishu local state can run openclaw doctor --fix.

Risks and Mitigations

  • Risk: Repair could remove more Feishu session state than intended.
    • Mitigation: The repair only targets direct Feishu channel session keys (feishu:* and agent:<id>:feishu:*), explicitly preserves ACP binding Feishu session keys, backs up session stores, and archives transcript artifacts by renaming them instead of deleting them.

Changed files

  • extensions/feishu/src/channel.ts (modified, +2/-0)
  • extensions/feishu/src/doctor.test.ts (added, +237/-0)
  • extensions/feishu/src/doctor.ts (added, +721/-0)

Code Example

openclaw-gateway.service active (running)
plugins: acpx, feishu, memory-core
CPU: ~95-120% of one core after startup
RSS: commonly ~800-1100 MB during the bad state
openclaw gateway health --json: sometimes timed out after 10000ms

---

proc_cpu_sec_delta=30.280 wall_sec=30 approx_one_core_pct=100.9
systemd_cpu_sec_delta=30.275 wall_sec=30 approx_one_core_pct=100.9

---

[feishu] starting feishu[default] (mode: websocket)
[feishu] feishu[default]: bot info probe timed out after 30000ms; continuing startup
[feishu] feishu[default]: bot open_id resolved: unknown
[feishu] feishu[default]: bot open_id unknown; starting background retry (...)
[feishu] feishu[default]: starting WebSocket connection...
[feishu] feishu[default]: WebSocket client started
[feishu] feishu[default]: bot open_id recovered via background retry: <redacted-open-id>
[agent/embedded] session file repaired: dropped 26 blank user message(s) (...jsonl)

---

~/.openclaw/agents/main/sessions/sessions.json

---

agent:main:feishu:direct:<redacted-open-id>

---

~/.openclaw/agents/main/sessions/<session-id>.jsonl
~/.openclaw/agents/main/sessions/<session-id>.trajectory.jsonl
~/.openclaw/agents/main/sessions/<session-id>.trajectory-path.json

---

~/.openclaw/feishu/

---

systemctl --user stop openclaw-gateway.service

# Back up before changing anything.
mkdir -p ~/.openclaw/backups/feishu-session-clean-YYYYMMDD-HHMMSS
cp -a ~/.openclaw/openclaw.json ~/.openclaw/backups/feishu-session-clean-YYYYMMDD-HHMMSS/
cp -a ~/.openclaw/agents/main/sessions/sessions.json ~/.openclaw/backups/feishu-session-clean-YYYYMMDD-HHMMSS/
cp -a ~/.openclaw/feishu ~/.openclaw/backups/feishu-session-clean-YYYYMMDD-HHMMSS/feishu

# Rebuild Feishu runtime/dedup state.
mv ~/.openclaw/feishu ~/.openclaw/feishu.disabled-YYYYMMDD-HHMMSS
mkdir -p ~/.openclaw/feishu

# Remove only sessions.json entries whose key contains ':feishu:' and move their
# corresponding .jsonl / .trajectory.jsonl / .trajectory-path.json aside.
# (I did this with a small JSON script rather than manual text editing.)

openclaw plugins registry --refresh
openclaw plugins enable feishu
openclaw config set channels.feishu.enabled true --strict-json
openclaw config set channels.feishu.accounts.default.enabled true --strict-json
openclaw config set channels.feishu.connectionMode '"websocket"' --strict-json

systemctl --user restart openclaw-gateway.service

---

gateway health: ok
loaded plugins: acpx, feishu, memory-core
Feishu channel: running=true, configured=true, reconnectAttempts=0
Feishu probe: ok=true, botName resolved, botOpenId resolved
openai-codex/gpt-5.5 gateway inference: returned OK
60s idle CPU delta with Feishu enabled: 0.1% of one core
30s CPU delta after inference: 0.0% of one core
RAW_BUFFERClick to expand / collapse

Summary

openclaw gateway can enter a sustained high-CPU state when the Feishu channel is enabled and the local Feishu channel/session state contains stale or corrupt data.

In this case the Feishu app credentials were valid and the bundled Feishu plugin itself was present. The gateway became stable only after rebuilding Feishu-local runtime/dedup state and removing the stale Feishu agent session entry/files. Re-enabling Feishu with the same App ID and secret then worked normally.

Environment

  • OpenClaw: 2026.4.26 (be8c246)
  • Install method: npm global install
  • Runtime: Node v24.14.1, npm 11.13.0
  • OS: Arch Linux, systemd user service
  • Service: openclaw-gateway.service
  • Gateway: local loopback on port 18789
  • Model configured: openai-codex/gpt-5.5
  • Channel: Feishu, connectionMode: "websocket"
  • Feishu plugin origin: bundled @openclaw/feishu

Actual behavior

With Feishu enabled, the gateway could start and sometimes report HTTP /health as live, but the process stayed near a full CPU core and gateway CLI calls could time out.

Representative bad samples:

openclaw-gateway.service active (running)
plugins: acpx, feishu, memory-core
CPU: ~95-120% of one core after startup
RSS: commonly ~800-1100 MB during the bad state
openclaw gateway health --json: sometimes timed out after 10000ms

A 30-second CPU delta while idle showed the process was truly burning CPU, not just a lifetime-average ps artifact:

proc_cpu_sec_delta=30.280 wall_sec=30 approx_one_core_pct=100.9
systemd_cpu_sec_delta=30.275 wall_sec=30 approx_one_core_pct=100.9

Logs around the problematic state included normal Feishu startup plus a session repair warning:

[feishu] starting feishu[default] (mode: websocket)
[feishu] feishu[default]: bot info probe timed out after 30000ms; continuing startup
[feishu] feishu[default]: bot open_id resolved: unknown
[feishu] feishu[default]: bot open_id unknown; starting background retry (...)
[feishu] feishu[default]: starting WebSocket connection...
[feishu] feishu[default]: WebSocket client started
[feishu] feishu[default]: bot open_id recovered via background retry: <redacted-open-id>
[agent/embedded] session file repaired: dropped 26 blank user message(s) (...jsonl)

Things I ruled out

I tested these independently:

  • Feishu App ID / secret: keeping the exact same credentials worked after state cleanup.
  • Feishu plugin installation: plugins inspect feishu showed bundled plugin @openclaw/feishu, version 2026.4.26; plugins doctor reported no plugin issues.
  • acpx alone: stable after startup, 30-second idle CPU delta around 0.2%.
  • acpx + memory-core: stable after startup, 30-second idle CPU delta around 0.0%.
  • Task registry DB: rebuilding ~/.openclaw/tasks/runs.sqlite* did not fix this Feishu-specific high CPU.
  • Model provider / Codex credentials: gateway inference worked normally once Feishu state was cleaned.

State that appeared relevant

The affected instance had a stale Feishu DM session entry in:

~/.openclaw/agents/main/sessions/sessions.json

The key shape was:

agent:main:feishu:direct:<redacted-open-id>

It pointed to Feishu session artifacts under:

~/.openclaw/agents/main/sessions/<session-id>.jsonl
~/.openclaw/agents/main/sessions/<session-id>.trajectory.jsonl
~/.openclaw/agents/main/sessions/<session-id>.trajectory-path.json

There was also Feishu runtime/dedup state under:

~/.openclaw/feishu/

Workaround that fixed it

I stopped the gateway, backed up state, rebuilt Feishu-local state, removed only Feishu channel session entries/files, refreshed the plugin registry, and re-enabled Feishu.

Approximate steps:

systemctl --user stop openclaw-gateway.service

# Back up before changing anything.
mkdir -p ~/.openclaw/backups/feishu-session-clean-YYYYMMDD-HHMMSS
cp -a ~/.openclaw/openclaw.json ~/.openclaw/backups/feishu-session-clean-YYYYMMDD-HHMMSS/
cp -a ~/.openclaw/agents/main/sessions/sessions.json ~/.openclaw/backups/feishu-session-clean-YYYYMMDD-HHMMSS/
cp -a ~/.openclaw/feishu ~/.openclaw/backups/feishu-session-clean-YYYYMMDD-HHMMSS/feishu

# Rebuild Feishu runtime/dedup state.
mv ~/.openclaw/feishu ~/.openclaw/feishu.disabled-YYYYMMDD-HHMMSS
mkdir -p ~/.openclaw/feishu

# Remove only sessions.json entries whose key contains ':feishu:' and move their
# corresponding .jsonl / .trajectory.jsonl / .trajectory-path.json aside.
# (I did this with a small JSON script rather than manual text editing.)

openclaw plugins registry --refresh
openclaw plugins enable feishu
openclaw config set channels.feishu.enabled true --strict-json
openclaw config set channels.feishu.accounts.default.enabled true --strict-json
openclaw config set channels.feishu.connectionMode '"websocket"' --strict-json

systemctl --user restart openclaw-gateway.service

After the workaround, with the same Feishu App ID/secret:

gateway health: ok
loaded plugins: acpx, feishu, memory-core
Feishu channel: running=true, configured=true, reconnectAttempts=0
Feishu probe: ok=true, botName resolved, botOpenId resolved
openai-codex/gpt-5.5 gateway inference: returned OK
60s idle CPU delta with Feishu enabled: 0.1% of one core
30s CPU delta after inference: 0.0% of one core

Expected behavior

A stale/corrupt Feishu channel session or runtime state file should not be able to put the whole gateway into a sustained hot loop.

Possible mitigations:

  • validate/quarantine Feishu channel session state before using it
  • isolate bad agent:...:feishu:... sessions instead of repeatedly trying to repair or process them
  • avoid unbounded retry/repair loops around Feishu session files
  • log a clear warning that identifies the problematic Feishu session key/file
  • provide a doctor repair that can safely rebuild Feishu-local state while preserving App ID/secret

I did not attach the old session/state files because they may contain private Feishu message metadata, but I can provide sanitized counts, file sizes, or additional log snippets if useful.

extent analysis

TL;DR

Rebuilding Feishu-local runtime/dedup state and removing stale Feishu agent session entries/files may resolve the high-CPU issue with the openclaw gateway when the Feishu channel is enabled.

Guidance

  1. Stop the gateway service: Use systemctl --user stop openclaw-gateway.service to prevent further CPU usage.
  2. Backup current state: Save the current state files, such as ~/.openclaw/openclaw.json, ~/.openclaw/agents/main/sessions/sessions.json, and ~/.openclaw/feishu/, to a backup directory.
  3. Rebuild Feishu runtime/dedup state: Remove the existing ~/.openclaw/feishu/ directory and recreate it using mkdir -p ~/.openclaw/feishu.
  4. Remove stale Feishu session entries: Identify and remove session entries in ~/.openclaw/agents/main/sessions/sessions.json that contain the key :feishu: and move their corresponding .jsonl, .trajectory.jsonl, and .trajectory-path.json files aside.

Example

No specific code snippet is provided, but the commands listed in the guidance section can be used to rebuild the Feishu runtime/dedup state and remove stale session entries.

Notes

The provided workaround assumes that the issue is caused by stale or corrupt Feishu channel session or runtime state files. If the issue persists after trying the workaround, further investigation may be necessary to identify the root cause.

Recommendation

Apply the workaround by rebuilding Feishu-local runtime/dedup state and removing stale Feishu agent session entries/files, as this has been shown to resolve the high-CPU issue in the given scenario.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

A stale/corrupt Feishu channel session or runtime state file should not be able to put the whole gateway into a sustained hot loop.

Possible mitigations:

  • validate/quarantine Feishu channel session state before using it
  • isolate bad agent:...:feishu:... sessions instead of repeatedly trying to repair or process them
  • avoid unbounded retry/repair loops around Feishu session files
  • log a clear warning that identifies the problematic Feishu session key/file
  • provide a doctor repair that can safely rebuild Feishu-local state while preserving App ID/secret

I did not attach the old session/state files because they may contain private Feishu message metadata, but I can provide sanitized counts, file sizes, or additional log snippets if useful.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix Feishu channel can hot-loop gateway on stale or corrupt channel session state [1 pull requests, 2 comments, 2 participants]