openclaw - 💡(How to fix) Fix [Bug]: Telegram ingress spool deadlocks indefinitely when slash-command sub-session has no registered harness

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

A single Telegram message that targets the slash: sub-session (e.g. /compact) can deadlock the entire ingress spool when its harness fails to register. The spool worker claims the file, the harness errors, the file is "kept for retry" instead of being dead-lettered, and the same update is replayed roughly twice a second forever. Every subsequent inbound message piles up behind it. The channel goes silent from the user's perspective.

Error Message

[telegram][diag] spooled update <update_id> failed; keeping for retry: MissingAgentHarnessError in middleware: Requested agent harness "claude-cli" is not registered. message processed: channel=telegram chatId=slash:<chat_id> messageId=<n> sessionId=unknown sessionKey=agent:main:telegram:slash:<chat_id> outcome=error duration=37ms error="MissingAgentHarnessError: ... lane task error: lane=session:agent:main:telegram:direct:<chat_id> durationMs=7 error="MissingAgentHarnessError: Requested agent harness "claude-cli" is not registered." lane task error: lane=main durationMs=6 error="MissingAgentHarnessError: Requested agent harness "claude-cli" is not registered."

Root Cause

Workaround (manual, destructive-ish)

Move the head-of-queue file (and any wedged followers) out of the spool dir. The retry loop dies on next iteration because the claimed file no longer exists, and a fresh direct: session is created on the next inbound update.

Fix Action

Fix / Workaround

Repro (observed, 2026-05-22)

  1. User types /compact to Telegram while the gateway/CLI harness has a registration gap for slash-session routes.
  2. Spool file is created at ~/.openclaw/telegram/ingress-spool-default/<padded_update_id>.json. Its update.message.text is /compact, entities[0].type = bot_command. Spool worker claims it.
  3. Middleware tries to dispatch to agent:main:telegram:slash:<chat_id> — note the :slash: segment, distinct from the normal :direct: route used for ordinary text.
  4. Harness lookup fails with MissingAgentHarnessError. Spool worker treats this as a transient error → "keeping for retry."
  5. Loop runs at ~2 retries/sec. In this incident, 6,153 retries over 52m 58s before manual intervention (10:01:58 → 10:54:56 PT). 15 subsequent Telegram messages stacked behind it.

Workaround (manual, destructive-ish)

Move the head-of-queue file (and any wedged followers) out of the spool dir. The retry loop dies on next iteration because the claimed file no longer exists, and a fresh direct: session is created on the next inbound update.

Code Example

[telegram][diag] spooled update <update_id> failed; keeping for retry: MissingAgentHarnessError in middleware: Requested agent harness "claude-cli" is not registered.
message processed: channel=telegram chatId=slash:<chat_id> messageId=<n> sessionId=unknown sessionKey=agent:main:telegram:slash:<chat_id> outcome=error duration=37ms error="MissingAgentHarnessError: ...
lane task error: lane=session:agent:main:telegram:direct:<chat_id> durationMs=7 error="MissingAgentHarnessError: Requested agent harness \"claude-cli\" is not registered."
lane task error: lane=main durationMs=6 error="MissingAgentHarnessError: Requested agent harness \"claude-cli\" is not registered."

---

STAMP=$(date +%Y-%m-%d-%H%M)
mkdir -p ~/.openclaw/telegram/ingress-spool-stash-$STAMP
mv ~/.openclaw/telegram/ingress-spool-default/*.json ~/.openclaw/telegram/ingress-spool-stash-$STAMP/

---

2026-05-22T10:01:58.065-07:00 [telegram][diag] spooled update <update_id> failed; keeping for retry: MissingAgentHarnessError in middleware: Requested agent harness "claude-cli" is not registered.
2026-05-22T10:01:58.484-07:00 [telegram][diag] spooled update <update_id> failed; keeping for retry: MissingAgentHarnessError in middleware: Requested agent harness "claude-cli" is not registered.
... (6,149 more lines, ~2 Hz) ...
2026-05-22T10:54:56.070-07:00 [telegram][diag] spooled update <update_id> failed; keeping for retry: MissingAgentHarnessError in middleware: Requested agent harness "claude-cli" is not registered.
2026-05-22T10:54:56.569-07:00 [telegram][diag] spooled update <update_id> failed; keeping for retry: MissingAgentHarnessError in middleware: Requested agent harness "claude-cli" is not registered.

---

{"time":"2026-05-22T09:24:58.336-07:00","level":"ERROR","subsystem":"diagnostic",
 "message":"lane task error: lane=session:agent:main:telegram:direct:<chat_id> durationMs=9 error=\"MissingAgentHarnessError: Requested agent harness \\\"claude-cli\\\" is not registered.\""}

---

{
  "version": 1,
  "updateId": <update_id>,
  "receivedAt": 1779469317607,
  "update": {
    "update_id": <update_id>,
    "message": {
      "message_id": 857,
      "from": {"id": <chat_id>, "is_bot": false, "first_name": "<first_name>",
               "last_name": "<last_name>", "username": "<username>", "language_code": "en"},
      "chat": {"id": <chat_id>, "first_name": "<first_name>", "last_name": "<last_name>",
               "username": "<username>", "type": "private"},
      "date": 1779469317,
      "text": "/compact",
      "entities": [{"offset": 0, "length": 8, "type": "bot_command"}]
    }
  },
  "claim": {
    "processId": "<gateway_pid>:<run_id>",
    "processPid": <gateway_pid>,
    "claimedAt": 1779472496523
  }
}
RAW_BUFFERClick to expand / collapse

Summary

A single Telegram message that targets the slash: sub-session (e.g. /compact) can deadlock the entire ingress spool when its harness fails to register. The spool worker claims the file, the harness errors, the file is "kept for retry" instead of being dead-lettered, and the same update is replayed roughly twice a second forever. Every subsequent inbound message piles up behind it. The channel goes silent from the user's perspective.

Symptoms

  • Telegram stops responding. Other channels (webchat) are healthy. openclaw channel status shows the channel itself as fine: token OK, accounts 1/1.
  • ~/.openclaw/telegram/ingress-spool-default/ accumulates messages, all blocked behind a head-of-queue file with a claim block bound to a single PID.
  • Gateway log emits this pair at ~2 Hz, indefinitely, with the same update_id:
[telegram][diag] spooled update <update_id> failed; keeping for retry: MissingAgentHarnessError in middleware: Requested agent harness "claude-cli" is not registered.
message processed: channel=telegram chatId=slash:<chat_id> messageId=<n> sessionId=unknown sessionKey=agent:main:telegram:slash:<chat_id> outcome=error duration=37ms error="MissingAgentHarnessError: ...
lane task error: lane=session:agent:main:telegram:direct:<chat_id> durationMs=7 error="MissingAgentHarnessError: Requested agent harness \"claude-cli\" is not registered."
lane task error: lane=main durationMs=6 error="MissingAgentHarnessError: Requested agent harness \"claude-cli\" is not registered."

Repro (observed, 2026-05-22)

  1. User types /compact to Telegram while the gateway/CLI harness has a registration gap for slash-session routes.
  2. Spool file is created at ~/.openclaw/telegram/ingress-spool-default/<padded_update_id>.json. Its update.message.text is /compact, entities[0].type = bot_command. Spool worker claims it.
  3. Middleware tries to dispatch to agent:main:telegram:slash:<chat_id> — note the :slash: segment, distinct from the normal :direct: route used for ordinary text.
  4. Harness lookup fails with MissingAgentHarnessError. Spool worker treats this as a transient error → "keeping for retry."
  5. Loop runs at ~2 retries/sec. In this incident, 6,153 retries over 52m 58s before manual intervention (10:01:58 → 10:54:56 PT). 15 subsequent Telegram messages stacked behind it.

Workaround (manual, destructive-ish)

Move the head-of-queue file (and any wedged followers) out of the spool dir. The retry loop dies on next iteration because the claimed file no longer exists, and a fresh direct: session is created on the next inbound update.

STAMP=$(date +%Y-%m-%d-%H%M)
mkdir -p ~/.openclaw/telegram/ingress-spool-stash-$STAMP
mv ~/.openclaw/telegram/ingress-spool-default/*.json ~/.openclaw/telegram/ingress-spool-stash-$STAMP/

This loses the queued messages (in this incident, 16 user messages including the poison head). Stashing rather than deleting preserves them for forensic inspection but does not replay them.

Root cause (best guess)

Two failures compound:

  1. Slash-command sub-sessions have no harness fallback. When a user issues a /foo slash command, routing creates/uses a separate sub-session agent:main:telegram:slash:<chat_id>. That sub-session needs its own harness registration. The direct: sub-session inherits the gateway's default harness; slash: apparently does not, or registers it lazily under conditions that can fail. So an otherwise-healthy gateway can have working direct: routing and broken slash: routing simultaneously, and a user has no way to discover this except by sending a slash command.

  2. MissingAgentHarnessError is treated as retryable. The spool worker classifies it the same as a transient network/DB error and keeps the file for retry. There is no apparent deadline, no exponential backoff, no max-retry cap, and no dead-letter quarantine. Hence the indefinite head-of-queue block.

Either fix alone unblocks the channel:

  • (a) Ensure slash sub-sessions always have a working harness (or fall back to direct: when their harness isn't registered, with a one-line "slash command not available" reply).
  • (b) Classify MissingAgentHarnessError (and any other "configuration error" class) as non-retryable. Move the file to a dead-letter dir, log loudly, and continue processing the rest of the queue.

Suggested fix shape

Belt-and-suspenders, in order of payoff:

  • Spool worker: introduce error classification. Configuration errors → quarantine, not retry. Transient errors keep their current behavior.
  • Slash router: if agent:main:telegram:slash:<chat_id> has no registered harness, either (i) fall back to the chat's direct: sub-session, or (ii) reply with a polite "slash commands aren't available right now" using the channel's lightweight outbound path (no harness needed), then drop the message.
  • Observability: emit a structured spool.poison_head warning when the same update_id retries more than N times. Currently the only signal is a flat line of identical error messages, which is easy to miss until the user notices silence.
  • Optional: a openclaw telegram spool-status CLI subcommand that prints (a) queue depth, (b) head file's claim PID and age, (c) retry count for the head, so an operator can spot a wedge without grepping logs.

Environment

  • OpenClaw 2026.5.20 (build e510042).
  • Channel: Telegram (telegram plugin, default account).
  • Harness: claude-cli.
  • Host: macOS arm64 (Mac mini), Node v25.9.0.

Sample logs (scrubbed)

First retry, last retry, and a sample lane error — same update_id throughout:

2026-05-22T10:01:58.065-07:00 [telegram][diag] spooled update <update_id> failed; keeping for retry: MissingAgentHarnessError in middleware: Requested agent harness "claude-cli" is not registered.
2026-05-22T10:01:58.484-07:00 [telegram][diag] spooled update <update_id> failed; keeping for retry: MissingAgentHarnessError in middleware: Requested agent harness "claude-cli" is not registered.
... (6,149 more lines, ~2 Hz) ...
2026-05-22T10:54:56.070-07:00 [telegram][diag] spooled update <update_id> failed; keeping for retry: MissingAgentHarnessError in middleware: Requested agent harness "claude-cli" is not registered.
2026-05-22T10:54:56.569-07:00 [telegram][diag] spooled update <update_id> failed; keeping for retry: MissingAgentHarnessError in middleware: Requested agent harness "claude-cli" is not registered.
{"time":"2026-05-22T09:24:58.336-07:00","level":"ERROR","subsystem":"diagnostic",
 "message":"lane task error: lane=session:agent:main:telegram:direct:<chat_id> durationMs=9 error=\"MissingAgentHarnessError: Requested agent harness \\\"claude-cli\\\" is not registered.\""}

Poison head spool file (scrubbed):

{
  "version": 1,
  "updateId": <update_id>,
  "receivedAt": 1779469317607,
  "update": {
    "update_id": <update_id>,
    "message": {
      "message_id": 857,
      "from": {"id": <chat_id>, "is_bot": false, "first_name": "<first_name>",
               "last_name": "<last_name>", "username": "<username>", "language_code": "en"},
      "chat": {"id": <chat_id>, "first_name": "<first_name>", "last_name": "<last_name>",
               "username": "<username>", "type": "private"},
      "date": 1779469317,
      "text": "/compact",
      "entities": [{"offset": 0, "length": 8, "type": "bot_command"}]
    }
  },
  "claim": {
    "processId": "<gateway_pid>:<run_id>",
    "processPid": <gateway_pid>,
    "claimedAt": 1779472496523
  }
}

Notes

  • This is a different failure mode from the context-overflow harness not registered crash that happens when a session blows past the model's 1M context limit. Same error string, different mechanism: that one is one failed turn; this one is an infinite retry of one queued message blocking the channel.
  • Stashed spool files retained locally at ~/.openclaw/telegram/ingress-spool-stash-2026-05-22-1054/ (16 files) in case live repro is useful.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: Telegram ingress spool deadlocks indefinitely when slash-command sub-session has no registered harness