openclaw - 💡(How to fix) Fix [Bug] High pre-processing latency (8-12s) on webchat after gateway restart, not in v2026.2.26 [1 participants]

openclaw2026-04-03 20:57:14

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#60528•Fetched 2026-04-08 02:49:59

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Levis-w

Participants

Levis-w

Timeline (top)

closed ×1locked ×1

Error Message

#!/usr/bin/env python3

-- coding: utf-8 --

import os, time, json, ssl, base64 import websocket import logging

URL = "ws://127.0.0.1:18789/ws" TOKEN = "123456" SESSION_KEY = "agent:main:main" DEVICE_STORAGE = os.path.expanduser("~/.openclaw/workspace/.xiaomeng_correct_device.json")

logging.basicConfig(format='%(asctime)s.%(msecs)03d %(message)s', datefmt='%H:%M:%S', level=logging.INFO) log = logging.info

class DeviceIdentity: def init(self, storage_path): self.storage_path = storage_path self.load()

def load(self):
    if not os.path.exists(self.storage_path):
        log(f"❌ Error: device file not found {self.storage_path}")
        os._exit(1)
    with open(self.storage_path, 'r') as f: data = json.load(f)
    from cryptography.hazmat.primitives.asymmetric import ed25519
    pem_lines = data['privateKeyPem'].strip().split('\n')
    b64 = ''.join([l for l in pem_lines if not l.startswith('---')])
    self.priv = ed25519.Ed25519PrivateKey.from_private_bytes(base64.b64decode(b64)[16:])
    self.device_id = data['deviceId']
    from cryptography.hazmat.primitives import serialization
    pub_bytes = self.priv.public_key().public_bytes(
        encoding=serialization.Encoding.Raw, format=serialization.PublicFormat.Raw
    )
    self.pub_b64 = base64.b64encode(pub_bytes).decode('ascii').rstrip('=').replace('+','-').replace('/','_')

def sign(self, p):
    sig = self.priv.sign(p.encode('utf-8'))
    return base64.b64encode(sig).decode('ascii').rstrip('=').replace('+','-').replace('/','_')

class LatencyTester: def init(self): self.device = DeviceIdentity(DEVICE_STORAGE) self.t0_send = 0 self.t1_start = 0 self.t2_first = 0 self.has_sent = False

def start(self):
    log("🚀 Connecting to OpenClaw...")
    self.ws = websocket.WebSocketApp(URL, header={'Authorization': f'Bearer {TOKEN}'},
        on_open=self.on_open, on_message=self.on_message, on_error=self.on_error)
    self.ws.run_forever(sslopt={"cert_reqs": ssl.CERT_NONE})

def on_open(self, ws): log("🔗 Connected")
def on_error(self, ws, e): log(f"❌ Error: {e}")

def send_test_msg(self, ws):
    if self.has_sent: return
    self.has_sent = True
    msg_id = f"m-{int(time.time()*1000)}"
    idem_key = f"s-{int(time.time()*1000)}"
    req = {
        "type": "req",
        "id": msg_id,
        "method": "chat.send",
        "params": {
            "sessionKey": SESSION_KEY,
            "message": "（延迟测试）",
            "deliver": True,
            "idempotencyKey": idem_key
        }
    }
    self.t0_send = time.time()
    log(f"📤 [T0] Sending message (ID: {msg_id})...")
    ws.send(json.dumps(req))

def on_message(self, ws, message):
    try:
        d = json.loads(message)
        ev = d.get('event')
        pl = d.get('payload', {})

        if ev == 'connect.challenge':
            ts = pl['ts']
            nonce = pl['nonce']
            auth_str = f"v2|{self.device.device_id}|webchat|webchat|operator|operator.admin,operator.write,operator.approvals,operator.pairing,operator.read|{ts}|{TOKEN}|{nonce}"
            sig = self.device.sign(auth_str)
            ws.send(json.dumps({
                "type":"req","id":"auth","method":"connect",
                "params":{
                    "minProtocol":3,"maxProtocol":3,"role":"operator",
                    "scopes":["operator.admin","operator.write","operator.approvals","operator.pairing","operator.read"],
                    "auth":{"token":TOKEN},
                    "device":{"id":self.device.device_id,"publicKey":self.device.pub_b64,"signature":sig,"signedAt":ts,"nonce":nonce},
                    "client":{"id":"webchat","version":"1.0","platform":"linux","mode":"webchat"}
                }
            }))

        elif d.get('id') == 'auth':
            if d.get('ok'):
                log("✅ Authentication successful, sending test message...")
                self.send_test_msg(ws)
            else:
                log(f"❌ Authentication failed: {d.get('error')}")

        elif ev == 'agent':
            stream = pl.get('stream')
            phase = pl.get('data', {}).get('phase')

            if stream == 'lifecycle' and phase == 'start':
                self.t1_start = time.time()
                diff = int((self.t1_start - self.t0_send) * 1000)
                log(f"⏱️ [T1] Agent started (Phase: Start): +{diff} ms")

            elif stream == 'assistant':
                if 'text' in pl.get('data', {}) or 'delta' in pl.get('data', {}):
                    if self.t2_first == 0:
                        self.t2_first = time.time()
                        self.print_report()
                        ws.close()

    except Exception as e:
        log(f"Error processing message: {e}")

def print_report(self):
    t0 = self.t0_send
    t1 = self.t1_start
    t2 = self.t2_first
    if t1 == 0: t1 = t2
    p1 = int((t1 - t0) * 1000)
    p2 = int((t2 - t1) * 1000)
    total = int((t2 - t0) * 1000)
    print("\n" + "="*40)
    print("📊 Latency Report")
    print("="*40)
    print(f"1️⃣ Phase 1 (Scheduling/RAG/History): {p1} ms")
    print(f"2️⃣ Phase 2 (LLM Generation):         {p2} ms")
    print("-" * 40)
    print(f"🏁 Total TTFT:                       {total} ms")
    print("="*40 + "\n")

if name == "main": LatencyTester().start()

Fix Action

Fix / Workaround

2. Config patch attempt (failed)

Tried reducing contextTokens to 32k — no improvement.

Code Example

02:52:25.731+08:00 session-init store-load agent=main session=agent:main:main elapsedMs=2
02:52:25.764+08:00 [model-selection] stage=configured-allowlist-built elapsedMs=2
02:52:25.765+08:00 [model-selection] stage=auth-profile-store-loaded elapsedMs=3
(T1 fires at +572ms — fast first message)
...
02:52:39.113+08:00 session-init store-load agent=main session=agent:main:main elapsedMs=2
02:52:39.146+08:00 [model-selection] stage=configured-allowlist-built elapsedMs=1
(T1 fires at +11303ms — slow second message)

---

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import os, time, json, ssl, base64
import websocket
import logging

URL = "ws://127.0.0.1:18789/ws"
TOKEN = "123456"
SESSION_KEY = "agent:main:main"
DEVICE_STORAGE = os.path.expanduser("~/.openclaw/workspace/.xiaomeng_correct_device.json")

logging.basicConfig(format='%(asctime)s.%(msecs)03d %(message)s', datefmt='%H:%M:%S', level=logging.INFO)
log = logging.info

class DeviceIdentity:
    def __init__(self, storage_path):
        self.storage_path = storage_path
        self.load()

    def load(self):
        if not os.path.exists(self.storage_path):
            log(f"❌ Error: device file not found {self.storage_path}")
            os._exit(1)
        with open(self.storage_path, 'r') as f: data = json.load(f)
        from cryptography.hazmat.primitives.asymmetric import ed25519
        pem_lines = data['privateKeyPem'].strip().split('\n')
        b64 = ''.join([l for l in pem_lines if not l.startswith('---')])
        self.priv = ed25519.Ed25519PrivateKey.from_private_bytes(base64.b64decode(b64)[16:])
        self.device_id = data['deviceId']
        from cryptography.hazmat.primitives import serialization
        pub_bytes = self.priv.public_key().public_bytes(
            encoding=serialization.Encoding.Raw, format=serialization.PublicFormat.Raw
        )
        self.pub_b64 = base64.b64encode(pub_bytes).decode('ascii').rstrip('=').replace('+','-').replace('/','_')

    def sign(self, p):
        sig = self.priv.sign(p.encode('utf-8'))
        return base64.b64encode(sig).decode('ascii').rstrip('=').replace('+','-').replace('/','_')

class LatencyTester:
    def __init__(self):
        self.device = DeviceIdentity(DEVICE_STORAGE)
        self.t0_send = 0
        self.t1_start = 0
        self.t2_first = 0
        self.has_sent = False

    def start(self):
        log("🚀 Connecting to OpenClaw...")
        self.ws = websocket.WebSocketApp(URL, header={'Authorization': f'Bearer {TOKEN}'},
            on_open=self.on_open, on_message=self.on_message, on_error=self.on_error)
        self.ws.run_forever(sslopt={"cert_reqs": ssl.CERT_NONE})

    def on_open(self, ws): log("🔗 Connected")
    def on_error(self, ws, e): log(f"❌ Error: {e}")

    def send_test_msg(self, ws):
        if self.has_sent: return
        self.has_sent = True
        msg_id = f"m-{int(time.time()*1000)}"
        idem_key = f"s-{int(time.time()*1000)}"
        req = {
            "type": "req",
            "id": msg_id,
            "method": "chat.send",
            "params": {
                "sessionKey": SESSION_KEY,
                "message": "（延迟测试）",
                "deliver": True,
                "idempotencyKey": idem_key
            }
        }
        self.t0_send = time.time()
        log(f"📤 [T0] Sending message (ID: {msg_id})...")
        ws.send(json.dumps(req))

    def on_message(self, ws, message):
        try:
            d = json.loads(message)
            ev = d.get('event')
            pl = d.get('payload', {})

            if ev == 'connect.challenge':
                ts = pl['ts']
                nonce = pl['nonce']
                auth_str = f"v2|{self.device.device_id}|webchat|webchat|operator|operator.admin,operator.write,operator.approvals,operator.pairing,operator.read|{ts}|{TOKEN}|{nonce}"
                sig = self.device.sign(auth_str)
                ws.send(json.dumps({
                    "type":"req","id":"auth","method":"connect",
                    "params":{
                        "minProtocol":3,"maxProtocol":3,"role":"operator",
                        "scopes":["operator.admin","operator.write","operator.approvals","operator.pairing","operator.read"],
                        "auth":{"token":TOKEN},
                        "device":{"id":self.device.device_id,"publicKey":self.device.pub_b64,"signature":sig,"signedAt":ts,"nonce":nonce},
                        "client":{"id":"webchat","version":"1.0","platform":"linux","mode":"webchat"}
                    }
                }))

            elif d.get('id') == 'auth':
                if d.get('ok'):
                    log("✅ Authentication successful, sending test message...")
                    self.send_test_msg(ws)
                else:
                    log(f"❌ Authentication failed: {d.get('error')}")

            elif ev == 'agent':
                stream = pl.get('stream')
                phase = pl.get('data', {}).get('phase')

                if stream == 'lifecycle' and phase == 'start':
                    self.t1_start = time.time()
                    diff = int((self.t1_start - self.t0_send) * 1000)
                    log(f"⏱️ [T1] Agent started (Phase: Start): +{diff} ms")

                elif stream == 'assistant':
                    if 'text' in pl.get('data', {}) or 'delta' in pl.get('data', {}):
                        if self.t2_first == 0:
                            self.t2_first = time.time()
                            self.print_report()
                            ws.close()

        except Exception as e:
            log(f"Error processing message: {e}")

    def print_report(self):
        t0 = self.t0_send
        t1 = self.t1_start
        t2 = self.t2_first
        if t1 == 0: t1 = t2
        p1 = int((t1 - t0) * 1000)
        p2 = int((t2 - t1) * 1000)
        total = int((t2 - t0) * 1000)
        print("\n" + "="*40)
        print("📊 Latency Report")
        print("="*40)
        print(f"1️⃣ Phase 1 (Scheduling/RAG/History): {p1} ms")
        print(f"2️⃣ Phase 2 (LLM Generation):         {p2} ms")
        print("-" * 40)
        print(f"🏁 Total TTFT:                       {total} ms")
        print("="*40 + "\n")

if __name__ == "__main__":
    LatencyTester().start()

RAW_BUFFERClick to expand / collapse

Bug Description

After upgrading past v2026.2.26, webchat messages suffer from 5-12 seconds of pre-processing latency before the LLM starts generating tokens. The first message after gateway restart is fast (~500ms-1.5s), but subsequent messages consistently show 8-12 seconds of delay.

Steps to Reproduce

Start/restart the OpenClaw gateway
Send a message via webchat — note the low latency
Send another message without restarting — observe 8-12s delay
The delay accumulates with each message exchange

Latency Test Results

Run 1 — First message after restart:

Phase 1 (preprocessing/RAG/history): ~1.5s
Phase 2 (LLM generation): ~20s
Total TTFT: ~21s

Run 2 — Second message (no restart):

Phase 1: ~7.9s
Phase 2: ~14s
Total TTFT: ~22s

Run 3 — Third message:

Phase 1: ~11.4s
Phase 2: ~12s
Total TTFT: ~23s

The delay increases with each message. Session info at the time:

contextTokens: 200000 (max)
inputTokens: ~43000 per message
Session store file size: 158KB for agent:main:main

Environment

OpenClaw: v2026.4.2 (also reproduced on v2026.2.26+)
Platform: WSL2 (Windows), Node.js 22.22.0
Model: MiniMax-M2.7
Session key: agent:main:main
WebSocket connection to ws://127.0.0.1:18789/ws

Troubleshooting Attempted

1. Session store IO analysis

Timing logs (OPENCLAW_DEBUG_INGRESS_TIMING=1) showed session store load is fast (~2-8ms). The bottleneck is NOT in session file reading.

2. Config patch attempt (failed)

Tried reducing contextTokens to 32k — no improvement.

3. Code investigation

loadSessionStore() with skipCache: true reads and does structuredClone() on the entire session store on every message.
The delay appears to be in the preprocessing pipeline (pre-LLM), not in the model inference itself.

4. Issue is NOT in v2026.2.26

User confirmed that v2026.2.26 does not have this problem. The regression was introduced in a subsequent version update.

5. Not related to humanDelay

The humanDelay mechanism was confirmed to only affect outgoing message delivery, not the pre-processing phase.

Relevant Log Excerpt (debug timing enabled)

02:52:25.731+08:00 session-init store-load agent=main session=agent:main:main elapsedMs=2
02:52:25.764+08:00 [model-selection] stage=configured-allowlist-built elapsedMs=2
02:52:25.765+08:00 [model-selection] stage=auth-profile-store-loaded elapsedMs=3
(T1 fires at +572ms — fast first message)
...
02:52:39.113+08:00 session-init store-load agent=main session=agent:main:main elapsedMs=2
02:52:39.146+08:00 [model-selection] stage=configured-allowlist-built elapsedMs=1
(T1 fires at +11303ms — slow second message)

Test Script

latency_test.py — a Python WebSocket latency test tool using device identity authentication:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import os, time, json, ssl, base64
import websocket
import logging

URL = "ws://127.0.0.1:18789/ws"
TOKEN = "123456"
SESSION_KEY = "agent:main:main"
DEVICE_STORAGE = os.path.expanduser("~/.openclaw/workspace/.xiaomeng_correct_device.json")

logging.basicConfig(format='%(asctime)s.%(msecs)03d %(message)s', datefmt='%H:%M:%S', level=logging.INFO)
log = logging.info

class DeviceIdentity:
    def __init__(self, storage_path):
        self.storage_path = storage_path
        self.load()

    def load(self):
        if not os.path.exists(self.storage_path):
            log(f"❌ Error: device file not found {self.storage_path}")
            os._exit(1)
        with open(self.storage_path, 'r') as f: data = json.load(f)
        from cryptography.hazmat.primitives.asymmetric import ed25519
        pem_lines = data['privateKeyPem'].strip().split('\n')
        b64 = ''.join([l for l in pem_lines if not l.startswith('---')])
        self.priv = ed25519.Ed25519PrivateKey.from_private_bytes(base64.b64decode(b64)[16:])
        self.device_id = data['deviceId']
        from cryptography.hazmat.primitives import serialization
        pub_bytes = self.priv.public_key().public_bytes(
            encoding=serialization.Encoding.Raw, format=serialization.PublicFormat.Raw
        )
        self.pub_b64 = base64.b64encode(pub_bytes).decode('ascii').rstrip('=').replace('+','-').replace('/','_')

    def sign(self, p):
        sig = self.priv.sign(p.encode('utf-8'))
        return base64.b64encode(sig).decode('ascii').rstrip('=').replace('+','-').replace('/','_')

class LatencyTester:
    def __init__(self):
        self.device = DeviceIdentity(DEVICE_STORAGE)
        self.t0_send = 0
        self.t1_start = 0
        self.t2_first = 0
        self.has_sent = False

    def start(self):
        log("🚀 Connecting to OpenClaw...")
        self.ws = websocket.WebSocketApp(URL, header={'Authorization': f'Bearer {TOKEN}'},
            on_open=self.on_open, on_message=self.on_message, on_error=self.on_error)
        self.ws.run_forever(sslopt={"cert_reqs": ssl.CERT_NONE})

    def on_open(self, ws): log("🔗 Connected")
    def on_error(self, ws, e): log(f"❌ Error: {e}")

    def send_test_msg(self, ws):
        if self.has_sent: return
        self.has_sent = True
        msg_id = f"m-{int(time.time()*1000)}"
        idem_key = f"s-{int(time.time()*1000)}"
        req = {
            "type": "req",
            "id": msg_id,
            "method": "chat.send",
            "params": {
                "sessionKey": SESSION_KEY,
                "message": "（延迟测试）",
                "deliver": True,
                "idempotencyKey": idem_key
            }
        }
        self.t0_send = time.time()
        log(f"📤 [T0] Sending message (ID: {msg_id})...")
        ws.send(json.dumps(req))

    def on_message(self, ws, message):
        try:
            d = json.loads(message)
            ev = d.get('event')
            pl = d.get('payload', {})

            if ev == 'connect.challenge':
                ts = pl['ts']
                nonce = pl['nonce']
                auth_str = f"v2|{self.device.device_id}|webchat|webchat|operator|operator.admin,operator.write,operator.approvals,operator.pairing,operator.read|{ts}|{TOKEN}|{nonce}"
                sig = self.device.sign(auth_str)
                ws.send(json.dumps({
                    "type":"req","id":"auth","method":"connect",
                    "params":{
                        "minProtocol":3,"maxProtocol":3,"role":"operator",
                        "scopes":["operator.admin","operator.write","operator.approvals","operator.pairing","operator.read"],
                        "auth":{"token":TOKEN},
                        "device":{"id":self.device.device_id,"publicKey":self.device.pub_b64,"signature":sig,"signedAt":ts,"nonce":nonce},
                        "client":{"id":"webchat","version":"1.0","platform":"linux","mode":"webchat"}
                    }
                }))

            elif d.get('id') == 'auth':
                if d.get('ok'):
                    log("✅ Authentication successful, sending test message...")
                    self.send_test_msg(ws)
                else:
                    log(f"❌ Authentication failed: {d.get('error')}")

            elif ev == 'agent':
                stream = pl.get('stream')
                phase = pl.get('data', {}).get('phase')

                if stream == 'lifecycle' and phase == 'start':
                    self.t1_start = time.time()
                    diff = int((self.t1_start - self.t0_send) * 1000)
                    log(f"⏱️ [T1] Agent started (Phase: Start): +{diff} ms")

                elif stream == 'assistant':
                    if 'text' in pl.get('data', {}) or 'delta' in pl.get('data', {}):
                        if self.t2_first == 0:
                            self.t2_first = time.time()
                            self.print_report()
                            ws.close()

        except Exception as e:
            log(f"Error processing message: {e}")

    def print_report(self):
        t0 = self.t0_send
        t1 = self.t1_start
        t2 = self.t2_first
        if t1 == 0: t1 = t2
        p1 = int((t1 - t0) * 1000)
        p2 = int((t2 - t1) * 1000)
        total = int((t2 - t0) * 1000)
        print("\n" + "="*40)
        print("📊 Latency Report")
        print("="*40)
        print(f"1️⃣ Phase 1 (Scheduling/RAG/History): {p1} ms")
        print(f"2️⃣ Phase 2 (LLM Generation):         {p2} ms")
        print("-" * 40)
        print(f"🏁 Total TTFT:                       {total} ms")
        print("="*40 + "\n")

if __name__ == "__main__":
    LatencyTester().start()

Questions for Maintainers

What changed in the preprocessing pipeline between v2026.2.26 and later versions that could cause this cumulative delay?
Is there a known memory/GC issue or session state growth problem that could explain the increasing delay pattern?
Are there any session compaction or context pruning settings that could help reduce input token load per message?

extent analysis

TL;DR

The most likely fix for the cumulative delay issue is to optimize the session store loading and preprocessing pipeline, potentially by implementing session compaction or context pruning to reduce input token load per message.

Guidance

Investigate the changes made to the preprocessing pipeline between v2026.2.26 and later versions to identify the root cause of the cumulative delay.
Analyze the session store loading process to determine if there are any memory or GC issues that could be contributing to the delay.
Explore the possibility of implementing session compaction or context pruning settings to reduce the input token load per message and alleviate the delay.
Review the loadSessionStore() function with skipCache: true to see if there are any optimizations that can be made to reduce the delay.

Example

No specific code example is provided, but the loadSessionStore() function and the preprocessing pipeline should be reviewed for potential optimizations.

Notes

The issue seems to be related to the changes made in the preprocessing pipeline and the session store loading process. Further investigation is needed to determine the root cause and implement a fix.

Recommendation

Apply a workaround by optimizing the session store loading and preprocessing pipeline, potentially by implementing session compaction or context pruning settings to reduce input token load per message. This is recommended because the issue is likely related to the changes made in the preprocessing pipeline and the session store loading process, and optimizing these areas may help alleviate the delay.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#database connection #vector store #embedding generation #cache error #file not found

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - 💡(How to fix) Fix [Bug] High pre-processing latency (8-12s) on webchat after gateway restart, not in v2026.2.26 [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

-- coding: utf-8 --

Fix Action

Fix / Workaround

2. Config patch attempt (failed)

Code Example

Bug Description

Steps to Reproduce

Latency Test Results

Environment

Troubleshooting Attempted

1. Session store IO analysis

2. Config patch attempt (failed)

3. Code investigation

4. Issue is NOT in v2026.2.26

5. Not related to humanDelay

Relevant Log Excerpt (debug timing enabled)

Test Script

Questions for Maintainers

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING