openclaw - ✅(Solved) Fix Telegram isolated ingress timeout recovery misses lone active spooled handler without backlog [1 pull requests, 1 comments, 2 participants]

openclaw2026-05-19 14:02:40

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#84158•Fetched 2026-05-20 03:43:23

View on GitHub

Comments

Participants

Timeline

Reactions

Author

crash2kx

Participants

clawsweeper[bot]

crash2kx

Timeline (top)

commented ×1cross-referenced ×1

After the #83505 fix is present, I still observed a Telegram isolated-ingress .json.processing marker remain stuck for a single active topic update when no later same-lane update was queued behind it.

This looks like a remaining edge case in the timeout recovery trigger, not a duplicate of the original #83272 failure mode.

Error Message

During Telegram topic testing, a topic message caused prolonged main-thread CPU pressure and delayed health/Telegram behavior. After the system settled, the ingress spool still contained a .json.processing file for the topic update.

Root Cause

Without this edge-case recovery, a lone stuck .json.processing marker can make the account appear mostly recovered while leaving stale spool state behind. On small VPS installs this also correlates with user-visible Telegram delays and event-loop/CPU pressure during the stuck turn.

Fix Action

Fix / Workaround

Current recovery appears to call timeout recovery using drain.blockedByLane as the candidate set. That catches the important case fixed by #83505, where a stuck handler blocks later same-lane updates.

I prepared a small local patch sketch against extensions/telegram/src/polling-session.ts and extensions/telegram/src/polling-session.test.ts; git diff --check passes. I have not deployed that patch to the running gateway.

PR fix notes

PR #84194: fix(telegram): recover lone timed-out spool handlers

Repository: openclaw/openclaw
Author: TurboTheTurtle
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/84194

Description (problem / solution / changelog)

Summary

Include active Telegram isolated-ingress handlers in timeout recovery candidates, even when no later same-lane update is queued.
Keep timed-out active handlers marked as stalled until they settle or are recovered.
Add regression coverage for a lone .json.processing topic update timing out without backlog.

Fixes #84158.

Real behavior proof

Behavior or issue addressed: A lone active Telegram isolated-ingress .json.processing claim could be missed by timeout recovery when no later same-lane update existed to populate blockedByLane.

Real environment tested: Local macOS checkout at commit 43c2a0c7b1, using the real TelegramPollingSession, Telegram ingress spool, reply fence, and isolated-ingress drain/recovery code with a local mock Telegram bot runtime and no network calls.

Exact steps or command run after this patch: PATH=/Users/andy/.cache/codex-runtimes/codex-primary-runtime/dependencies/node/bin:$PATH node --import tsx /private/tmp/proof-84158.mjs

Evidence after fix: Terminal output from the live isolated-ingress spool proof:

openclaw-telegram-lone-spooled-timeout-proof=ok
events=bot0:42
bot_instances=2
pending_update_ids=none
failed_update_ids=42
timeout_log_present=true

Observed result after fix: The single claimed topic update 42 timed out without any later same-lane backlog, was written as a failed tombstone, left no pending updates behind, logged the timeout, and restarted isolated ingress.

What was not tested: I did not connect to the live Telegram Bot API. The proof avoids external network calls and exercises the local spool/recovery behavior directly.

Validation

NODE_OPTIONS=--max-old-space-size=8192 OPENCLAW_VITEST_MAX_WORKERS=1 PATH=/Users/andy/.cache/codex-runtimes/codex-primary-runtime/dependencies/node/bin:$PATH node scripts/run-vitest.mjs extensions/telegram/src/polling-session.test.ts --pool forks --maxWorkers 1 --vmMemoryLimit 8192MB - 43 passed
PATH=/Users/andy/.cache/codex-runtimes/codex-primary-runtime/dependencies/node/bin:$PATH node_modules/.bin/oxfmt --check --threads=1 extensions/telegram/src/polling-session.ts extensions/telegram/src/polling-session.test.ts
PATH=/Users/andy/.cache/codex-runtimes/codex-primary-runtime/dependencies/node/bin:$PATH node_modules/.bin/oxlint extensions/telegram/src/polling-session.ts extensions/telegram/src/polling-session.test.ts
git diff --check
git log --format='%h %an <%ae> %s' upstream/main..HEAD: 43c2a0c7b1 Andy Ye <[email protected]> fix(telegram): recover lone timed-out spool handlers

Changed files

extensions/telegram/src/polling-session.test.ts (modified, +91/-0)
extensions/telegram/src/polling-session.ts (modified, +21/-4)

Code Example

const timeoutCandidateHandlerKeys = this.#activeSpooledUpdateHandlerKeysForSpool(spoolDir);
for (const handlerKey of drain.blockedByLane) {
  timeoutCandidateHandlerKeys.add(handlerKey);
}
const timedOutRecovery = await this.#recoverTimedOutSpooledHandler(timeoutCandidateHandlerKeys);

RAW_BUFFERClick to expand / collapse

Summary

This looks like a remaining edge case in the timeout recovery trigger, not a duplicate of the original #83272 failure mode.

#83272
#83505
Fix commit present locally: b7735f88fa2772b3103ed55eb1294ca4685f122a

Environment

OpenClaw: 2026.5.18
Local commit: 50a2481652
Install type: Docker
Channel: Telegram supergroup forum topics
Runtime: Codex app-server / embedded agent
Gateway state at inspection time: running and healthy after the turn eventually cleared

Observed behavior

Important detail: there was not necessarily a later same-lane update behind that processing marker. The update could therefore remain a lone active handler rather than appearing in drain.blockedByLane.

Source-level concern

But a single active stuck handler without a later same-lane update may not be included in blockedByLane, so #recoverTimedOutSpooledHandler(...) may not evaluate it for timeout recovery even after the handler timeout has elapsed.

Suggested narrow fix

Build the timeout candidate set from all active spooled handlers for the same spool, then union in drain.blockedByLane for compatibility:

const timeoutCandidateHandlerKeys = this.#activeSpooledUpdateHandlerKeysForSpool(spoolDir);
for (const handlerKey of drain.blockedByLane) {
  timeoutCandidateHandlerKeys.add(handlerKey);
}
const timedOutRecovery = await this.#recoverTimedOutSpooledHandler(timeoutCandidateHandlerKeys);

This preserves same-lane ordering and #83505's tombstone/restart behavior, but also lets a lone active processing claim time out.

Regression coverage idea

Add a polling-session test where:

A single spooled topic update is claimed and handleUpdate never settles.
No later same-lane update exists.
spooledUpdateHandlerTimeoutMs elapses.
The update is failed into a tombstone and isolated ingress restart is requested.

Why this matters

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#database connection #vector store #embedding generation #cache error #pipeline error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - ✅(Solved) Fix Telegram isolated ingress timeout recovery misses lone active spooled handler without backlog [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #84194: fix(telegram): recover lone timed-out spool handlers

Description (problem / solution / changelog)

Summary

Real behavior proof

Validation

Changed files

Code Example

Summary

Related

Environment

Observed behavior

Source-level concern

Suggested narrow fix

Regression coverage idea

Why this matters

Still need to ship something?

TRENDING

openclaw - ✅(Solved) Fix Telegram isolated ingress timeout recovery misses lone active spooled handler without backlog [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #84194: fix(telegram): recover lone timed-out spool handlers

Description (problem / solution / changelog)

Summary

Real behavior proof

Validation

Changed files

Code Example

Summary

Related

Environment

Observed behavior

Source-level concern

Suggested narrow fix

Regression coverage idea

Why this matters

Still need to ship something?

RELATED_DISCOVERY

TRENDING