openclaw - 💡(How to fix) Fix feat(gateway): persistent outbound message queue across restarts [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#54322Fetched 2026-04-08 01:29:02
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Participants
RAW_BUFFERClick to expand / collapse

Problem

When the gateway restarts (planned or crash), any outbound messages that were queued but not yet delivered are lost. The agent may have composed a reply, but the user never receives it.

This is especially noticeable when an agent triggers a restart itself (e.g., to apply config changes) — the "I am restarting now" message may not reach the user before the process dies.

Proposal

Add a write-ahead log (WAL) for outbound messages so they survive gateway restarts.

How it works

  1. Before attempting delivery, write the outbound message to a persistent queue file (e.g., ~/.openclaw/outbound-queue.jsonl)
  2. On successful delivery → remove from queue (or mark as delivered)
  3. On gateway startup → drain any undelivered messages in order
  4. Apply a TTL (e.g., 10 minutes) — stale messages are dropped, not delivered late

Edge cases to handle

  • Duplicate delivery: message was sent but not marked as delivered before crash → use idempotency keys per message
  • Stale messages: gateway was down for a long time → TTL expiry prevents delivering outdated messages
  • Ordering: messages must be delivered in the order they were queued
  • Multi-channel: queue entries should include the target channel/recipient so delivery routes correctly on restart

Complexity

Medium-low — essentially a JSONL write-ahead log with delivery confirmation and TTL pruning.

Relationship to other issues

This complements #54321 (graceful reload). Reload prevents most unnecessary restarts; the persistent queue handles the remaining cases (code updates, crashes, actual restarts). Together they provide a robust solution.

extent analysis

Fix Plan

To implement a write-ahead log (WAL) for outbound messages, follow these steps:

  • Create a persistent queue file (e.g., ~/.openclaw/outbound-queue.jsonl) to store outbound messages before delivery.
  • Modify the message delivery code to:
    • Write the message to the queue file before attempting delivery.
    • Remove the message from the queue on successful delivery.
  • On gateway startup, drain any undelivered messages in the queue.
  • Implement a TTL (e.g., 10 minutes) to drop stale messages.

Example code snippet in Python:

import json
import os
import time

# Queue file path
QUEUE_FILE = os.path.expanduser('~/.openclaw/outbound-queue.jsonl')

def write_to_queue(message):
    with open(QUEUE_FILE, 'a') as f:
        json.dump(message, f)
        f.write('\n')

def deliver_message(message):
    # Attempt delivery
    if delivery_successful:
        remove_from_queue(message['id'])

def remove_from_queue(message_id):
    with open(QUEUE_FILE, 'r+') as f:
        lines = f.readlines()
        f.seek(0)
        for line in lines:
            message = json.loads(line)
            if message['id'] != message_id:
                f.write(line)
        f.truncate()

def drain_queue():
    with open(QUEUE_FILE, 'r') as f:
        for line in f:
            message = json.loads(line)
            if time.time() - message['timestamp'] < 600:  # 10 minutes
                deliver_message(message)
            else:
                print("Dropping stale message")

# On gateway startup
drain_queue()

Verification

To verify the fix, test the following scenarios:

  • Restart the gateway while there are queued messages.
  • Verify that the messages are delivered after the gateway restarts.
  • Test duplicate delivery by sending a message, restarting the gateway, and verifying that the message is not delivered twice.
  • Test stale messages by setting a low TTL and verifying that messages are dropped after the TTL expires.

Extra Tips

  • Use a robust queue implementation, such as a message broker like RabbitMQ or Apache Kafka, for production environments.
  • Consider implementing idempotency keys to handle duplicate delivery.
  • Monitor the queue size and adjust the TTL as needed to prevent message buildup.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix feat(gateway): persistent outbound message queue across restarts [1 participants]