openclaw - ✅(Solved) Fix Telegram isolated ingress timeout recovery misses lone active spooled handler without backlog [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#84158Fetched 2026-05-20 03:43:23
View on GitHub
Comments
1
Participants
2
Timeline
2
Reactions
1
Author
Timeline (top)
commented ×1cross-referenced ×1

After the #83505 fix is present, I still observed a Telegram isolated-ingress .json.processing marker remain stuck for a single active topic update when no later same-lane update was queued behind it.

This looks like a remaining edge case in the timeout recovery trigger, not a duplicate of the original #83272 failure mode.

Error Message

During Telegram topic testing, a topic message caused prolonged main-thread CPU pressure and delayed health/Telegram behavior. After the system settled, the ingress spool still contained a .json.processing file for the topic update.

Root Cause

Without this edge-case recovery, a lone stuck .json.processing marker can make the account appear mostly recovered while leaving stale spool state behind. On small VPS installs this also correlates with user-visible Telegram delays and event-loop/CPU pressure during the stuck turn.

Fix Action

Fix / Workaround

Current recovery appears to call timeout recovery using drain.blockedByLane as the candidate set. That catches the important case fixed by #83505, where a stuck handler blocks later same-lane updates.

I prepared a small local patch sketch against extensions/telegram/src/polling-session.ts and extensions/telegram/src/polling-session.test.ts; git diff --check passes. I have not deployed that patch to the running gateway.

PR fix notes

PR #84194: fix(telegram): recover lone timed-out spool handlers

Description (problem / solution / changelog)

Summary

  • Include active Telegram isolated-ingress handlers in timeout recovery candidates, even when no later same-lane update is queued.
  • Keep timed-out active handlers marked as stalled until they settle or are recovered.
  • Add regression coverage for a lone .json.processing topic update timing out without backlog.

Fixes #84158.

Real behavior proof

Behavior or issue addressed: A lone active Telegram isolated-ingress .json.processing claim could be missed by timeout recovery when no later same-lane update existed to populate blockedByLane.

Real environment tested: Local macOS checkout at commit 43c2a0c7b1, using the real TelegramPollingSession, Telegram ingress spool, reply fence, and isolated-ingress drain/recovery code with a local mock Telegram bot runtime and no network calls.

Exact steps or command run after this patch: PATH=/Users/andy/.cache/codex-runtimes/codex-primary-runtime/dependencies/node/bin:$PATH node --import tsx /private/tmp/proof-84158.mjs

Evidence after fix: Terminal output from the live isolated-ingress spool proof:

openclaw-telegram-lone-spooled-timeout-proof=ok
events=bot0:42
bot_instances=2
pending_update_ids=none
failed_update_ids=42
timeout_log_present=true

Observed result after fix: The single claimed topic update 42 timed out without any later same-lane backlog, was written as a failed tombstone, left no pending updates behind, logged the timeout, and restarted isolated ingress.

What was not tested: I did not connect to the live Telegram Bot API. The proof avoids external network calls and exercises the local spool/recovery behavior directly.

Validation

  • NODE_OPTIONS=--max-old-space-size=8192 OPENCLAW_VITEST_MAX_WORKERS=1 PATH=/Users/andy/.cache/codex-runtimes/codex-primary-runtime/dependencies/node/bin:$PATH node scripts/run-vitest.mjs extensions/telegram/src/polling-session.test.ts --pool forks --maxWorkers 1 --vmMemoryLimit 8192MB - 43 passed
  • PATH=/Users/andy/.cache/codex-runtimes/codex-primary-runtime/dependencies/node/bin:$PATH node_modules/.bin/oxfmt --check --threads=1 extensions/telegram/src/polling-session.ts extensions/telegram/src/polling-session.test.ts
  • PATH=/Users/andy/.cache/codex-runtimes/codex-primary-runtime/dependencies/node/bin:$PATH node_modules/.bin/oxlint extensions/telegram/src/polling-session.ts extensions/telegram/src/polling-session.test.ts
  • git diff --check
  • git log --format='%h %an <%ae> %s' upstream/main..HEAD: 43c2a0c7b1 Andy Ye <[email protected]> fix(telegram): recover lone timed-out spool handlers

Changed files

  • extensions/telegram/src/polling-session.test.ts (modified, +91/-0)
  • extensions/telegram/src/polling-session.ts (modified, +21/-4)

Code Example

const timeoutCandidateHandlerKeys = this.#activeSpooledUpdateHandlerKeysForSpool(spoolDir);
for (const handlerKey of drain.blockedByLane) {
  timeoutCandidateHandlerKeys.add(handlerKey);
}
const timedOutRecovery = await this.#recoverTimedOutSpooledHandler(timeoutCandidateHandlerKeys);
RAW_BUFFERClick to expand / collapse

Summary

After the #83505 fix is present, I still observed a Telegram isolated-ingress .json.processing marker remain stuck for a single active topic update when no later same-lane update was queued behind it.

This looks like a remaining edge case in the timeout recovery trigger, not a duplicate of the original #83272 failure mode.

Related

  • #83272
  • #83505
  • Fix commit present locally: b7735f88fa2772b3103ed55eb1294ca4685f122a

Environment

  • OpenClaw: 2026.5.18
  • Local commit: 50a2481652
  • Install type: Docker
  • Channel: Telegram supergroup forum topics
  • Runtime: Codex app-server / embedded agent
  • Gateway state at inspection time: running and healthy after the turn eventually cleared

Observed behavior

During Telegram topic testing, a topic message caused prolonged main-thread CPU pressure and delayed health/Telegram behavior. After the system settled, the ingress spool still contained a .json.processing file for the topic update.

Important detail: there was not necessarily a later same-lane update behind that processing marker. The update could therefore remain a lone active handler rather than appearing in drain.blockedByLane.

Source-level concern

Current recovery appears to call timeout recovery using drain.blockedByLane as the candidate set. That catches the important case fixed by #83505, where a stuck handler blocks later same-lane updates.

But a single active stuck handler without a later same-lane update may not be included in blockedByLane, so #recoverTimedOutSpooledHandler(...) may not evaluate it for timeout recovery even after the handler timeout has elapsed.

Suggested narrow fix

Build the timeout candidate set from all active spooled handlers for the same spool, then union in drain.blockedByLane for compatibility:

const timeoutCandidateHandlerKeys = this.#activeSpooledUpdateHandlerKeysForSpool(spoolDir);
for (const handlerKey of drain.blockedByLane) {
  timeoutCandidateHandlerKeys.add(handlerKey);
}
const timedOutRecovery = await this.#recoverTimedOutSpooledHandler(timeoutCandidateHandlerKeys);

This preserves same-lane ordering and #83505's tombstone/restart behavior, but also lets a lone active processing claim time out.

Regression coverage idea

Add a polling-session test where:

  1. A single spooled topic update is claimed and handleUpdate never settles.
  2. No later same-lane update exists.
  3. spooledUpdateHandlerTimeoutMs elapses.
  4. The update is failed into a tombstone and isolated ingress restart is requested.

I prepared a small local patch sketch against extensions/telegram/src/polling-session.ts and extensions/telegram/src/polling-session.test.ts; git diff --check passes. I have not deployed that patch to the running gateway.

Why this matters

Without this edge-case recovery, a lone stuck .json.processing marker can make the account appear mostly recovered while leaving stale spool state behind. On small VPS installs this also correlates with user-visible Telegram delays and event-loop/CPU pressure during the stuck turn.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix Telegram isolated ingress timeout recovery misses lone active spooled handler without backlog [1 pull requests, 1 comments, 2 participants]