openclaw - 💡(How to fix) Fix [Bug] Restart-sentinel continuation can get stuck after transient Telegram sendMessage failure [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#76087Fetched 2026-05-03 04:42:34
View on GitHub
Comments
1
Participants
2
Timeline
4
Reactions
2
Timeline (top)
commented ×1mentioned ×1subscribed ×1unsubscribed ×1

Restart-sentinel continuation delivery can become stuck in ~/.openclaw/session-delivery-queue/ after a transient Telegram sendMessage network failure. The continuation entry is persisted and retry metadata is updated, but there appears to be no periodic/session wake recovery after the immediate startup drain fails, so the user never receives the post-restart continuation/report until another gateway startup happens.

This is related to Telegram transport flakiness / delivery durability, but distinct from ordinary outbound message queue recovery: restart continuation uses the session-delivery queue in server-restart-sentinel, and a failed immediate drain remains pending without an automatic follow-up retry.

Root Cause

Restart-sentinel continuation delivery can become stuck in ~/.openclaw/session-delivery-queue/ after a transient Telegram sendMessage network failure. The continuation entry is persisted and retry metadata is updated, but there appears to be no periodic/session wake recovery after the immediate startup drain fails, so the user never receives the post-restart continuation/report until another gateway startup happens.

This is related to Telegram transport flakiness / delivery durability, but distinct from ordinary outbound message queue recovery: restart continuation uses the session-delivery queue in server-restart-sentinel, and a failed immediate drain remains pending without an automatic follow-up retry.

Code Example

2026-05-02T21:16:03.924+08:00 [restart-sentinel] restart continuation: retry failed for entry 5bfb077fa406f9dad97b2a82488f95bef182e9ae4f9672620de895622f578052: Network request for 'sendMessage' failed!

---

{
  "kind": "agentTurn",
  "sessionKey": "agent:system-architect:telegram:direct:<user>",
  "message": "Post-restart continuation ...",
  "route": {
    "channel": "telegram",
    "to": "telegram:<user>",
    "accountId": "system-architect",
    "chatType": "direct"
  },
  "id": "5bfb077fa406f9dad97b2a82488f95bef182e9ae4f9672620de895622f578052",
  "enqueuedAt": 1777727345013,
  "retryCount": 1,
  "lastAttemptAt": 1777727763925,
  "lastError": "Network request for 'sendMessage' failed!"
}

---

~/.openclaw/session-delivery-queue/5bfb077fa406f9dad97b2a82488f95bef182e9ae4f9672620de895622f578052.json

---

2026-05-02T13:44:10+08:00 [telegram] ... Network request for 'getUpdates' failed!
2026-05-02T16:02:34+08:00 [telegram] Polling stall detected ... Network request for 'getUpdates' failed!
2026-05-02T18:12:28+08:00 [telegram] Polling stall detected ... Network request for 'getUpdates' failed!
2026-05-02T21:16:03+08:00 [restart-sentinel] restart continuation: retry failed ... Network request for 'sendMessage' failed!
RAW_BUFFERClick to expand / collapse

Summary

Restart-sentinel continuation delivery can become stuck in ~/.openclaw/session-delivery-queue/ after a transient Telegram sendMessage network failure. The continuation entry is persisted and retry metadata is updated, but there appears to be no periodic/session wake recovery after the immediate startup drain fails, so the user never receives the post-restart continuation/report until another gateway startup happens.

This is related to Telegram transport flakiness / delivery durability, but distinct from ordinary outbound message queue recovery: restart continuation uses the session-delivery queue in server-restart-sentinel, and a failed immediate drain remains pending without an automatic follow-up retry.

Environment

  • OpenClaw: 2026.4.29 (a448042)
  • Host: macOS 26.4.1 arm64, Node 25.9.0
  • Channel: Telegram long-poll bot mode
  • Surface: restart-sentinel continuation for agent:system-architect:telegram:direct:<user>

Observed evidence

Gateway log:

2026-05-02T21:16:03.924+08:00 [restart-sentinel] restart continuation: retry failed for entry 5bfb077fa406f9dad97b2a82488f95bef182e9ae4f9672620de895622f578052: Network request for 'sendMessage' failed!

The entry remained pending afterwards:

{
  "kind": "agentTurn",
  "sessionKey": "agent:system-architect:telegram:direct:<user>",
  "message": "Post-restart continuation ...",
  "route": {
    "channel": "telegram",
    "to": "telegram:<user>",
    "accountId": "system-architect",
    "chatType": "direct"
  },
  "id": "5bfb077fa406f9dad97b2a82488f95bef182e9ae4f9672620de895622f578052",
  "enqueuedAt": 1777727345013,
  "retryCount": 1,
  "lastAttemptAt": 1777727763925,
  "lastError": "Network request for 'sendMessage' failed!"
}

The entry was still present at:

~/.openclaw/session-delivery-queue/5bfb077fa406f9dad97b2a82488f95bef182e9ae4f9672620de895622f578052.json

During the same day the Telegram channel had intermittent transport failures, but later openclaw status --deep reported Telegram OK, so this is a transient-send/recovery problem rather than a permanent token/config failure.

Examples from the same day:

2026-05-02T13:44:10+08:00 [telegram] ... Network request for 'getUpdates' failed!
2026-05-02T16:02:34+08:00 [telegram] Polling stall detected ... Network request for 'getUpdates' failed!
2026-05-02T18:12:28+08:00 [telegram] Polling stall detected ... Network request for 'getUpdates' failed!
2026-05-02T21:16:03+08:00 [restart-sentinel] restart continuation: retry failed ... Network request for 'sendMessage' failed!

Source-code trace

Installed dist paths on 2026.4.29:

  • dist/server-restart-sentinel-_OFhLVvA.js
  • dist/hook-client-ip-config-BCNYpeHn.js

Relevant flow:

  1. scheduleRestartSentinelWake() loads the restart sentinel at gateway startup.
  2. loadRestartSentinelStartupTask() enqueues a restart continuation via enqueueSessionDelivery(buildQueuedRestartContinuation(...)).
  3. It then calls drainRestartContinuationQueue({ entryId }).
  4. drainRestartContinuationQueue() calls drainPendingSessionDeliveries(... selectEntry: entry.id === entryId, bypassBackoff: true).
  5. If deliverQueuedSessionDelivery() throws (Telegram sendMessage network failure), drainQueuedEntry() calls failSessionDelivery(), increments retryCount, stores lastError, and leaves the JSON in session-delivery-queue/.
  6. recoverPendingRestartContinuationDeliveries() exists, but startup wiring in activateGatewayScheduledServices() only calls recoverPendingSessionDeliveries() once after startup (setTimeout(..., 1250)), not periodically.

So if the immediate startup continuation drain fails after that startup recovery window, the entry remains pending until the next gateway startup (or manual intervention).

Expected behavior

Restart continuation delivery should be durable and eventually delivered after transient Telegram/network failures, without requiring another gateway restart.

At minimum:

  • Pending restart continuation entries should be retried after their backoff expires.
  • Retries should continue up to the existing max retry limit (retryCount >= 5).
  • Permanent delivery errors (e.g. 400 chat not found / thread not found) should be moved to failed with clear diagnostics.
  • Transient transport errors should not leave the restart continuation invisible to the user.

Actual behavior

A transient Telegram sendMessage network failure leaves the restart continuation pending in session-delivery-queue/ with retryCount: 1, but no automatic retry occurs during normal gateway operation. The user sees no post-restart continuation/report.

Impact

This weakens the safety of the new gateway restart runner. The runner correctly writes a sentinel and enqueues continuation, but a single transient Telegram send failure can make the continuation appear lost. That pushes operators back toward manual checking and makes protected restart workflows less reliable.

Suggested fix

Add periodic/backoff-based recovery for session-delivery queue entries, similar in spirit to outbound delivery recovery.

Possible implementation options:

  1. Start a lightweight scheduled recovery loop for recoverPendingRestartContinuationDeliveries() after gateway startup, e.g. every 30-60s, respecting isSessionDeliveryEligibleForRetry() and max retries.
  2. Or, after drainRestartContinuationQueue() fails an entry, schedule a one-shot retry timer for that entry's next backoff deadline.
  3. Add status visibility for pending/failed session-delivery queue entries in openclaw status --deep or a diagnostic command.
  4. Consider aligning restart continuation queue behavior with the normal outbound delivery-queue recovery behavior, while preserving idempotency via the existing idempotencyKey.

Related

Possibly related but not identical:

  • #71429 — Telegram gateway drops in-flight messages on network failure/hot reload
  • #75539 — Telegram/QQBot plugins IPv6/undici transport connectivity problems

This issue is specifically about restart-sentinel session-delivery retry scheduling after a transient sendMessage failure.

extent analysis

TL;DR

Implement a periodic recovery mechanism for pending restart continuation deliveries to ensure durability after transient Telegram/network failures.

Guidance

  • Introduce a scheduled recovery loop for recoverPendingRestartContinuationDeliveries() to run at regular intervals (e.g., every 30-60 seconds) to retry pending entries.
  • Consider adding a one-shot retry timer for each failed entry's next backoff deadline after drainRestartContinuationQueue() fails.
  • Ensure the recovery mechanism respects isSessionDeliveryEligibleForRetry() and max retries to prevent infinite loops.
  • Add status visibility for pending/failed session-delivery queue entries in openclaw status --deep or a diagnostic command for better monitoring.

Example

No code example is provided due to the complexity of the issue and the need for a thorough understanding of the existing codebase.

Notes

The suggested fix aims to align restart continuation queue behavior with normal outbound delivery recovery while preserving idempotency. However, the implementation details may vary depending on the specific requirements and constraints of the OpenClaw system.

Recommendation

Apply a workaround by implementing a periodic recovery mechanism, such as a scheduled loop or one-shot retry timers, to ensure pending restart continuation deliveries are retried after transient failures. This approach will help improve the durability and reliability of the restart runner.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Restart continuation delivery should be durable and eventually delivered after transient Telegram/network failures, without requiring another gateway restart.

At minimum:

  • Pending restart continuation entries should be retried after their backoff expires.
  • Retries should continue up to the existing max retry limit (retryCount >= 5).
  • Permanent delivery errors (e.g. 400 chat not found / thread not found) should be moved to failed with clear diagnostics.
  • Transient transport errors should not leave the restart continuation invisible to the user.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug] Restart-sentinel continuation can get stuck after transient Telegram sendMessage failure [1 comments, 2 participants]