openclaw - 💡(How to fix) Fix [Bug]: High pre-processing latency (8-12s) on webchat after gateway restart, not in v2026.2.26 [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#60505Fetched 2026-04-08 02:50:14
View on GitHub
Comments
0
Participants
1
Timeline
3
Reactions
0
Author
Participants
Timeline (top)
closed ×1locked ×1renamed ×1

Error Message

#!/usr/bin/env python3

-- coding: utf-8 --

import os, time, json, ssl, base64, hashlib, threading import websocket import logging

========== Configuration ==========

URL = "ws://127.0.0.1:18789/ws" TOKEN = "123456" SESSION_KEY = "agent:main:main" DEVICE_STORAGE = os.path.expanduser("~/.openclaw/workspace/.xiaomeng_correct_device.json")

==================================

Logging format

logging.basicConfig(format='%(asctime)s.%(msecs)03d %(message)s', datefmt='%H:%M:%S', level=logging.INFO) log = logging.info

class DeviceIdentity: def init(self, storage_path): self.storage_path = storage_path self.load()

def load(self):
    if not os.path.exists(self.storage_path):
        log(f"❌ Error: device file not found {self.storage_path}")
        os._exit(1)
    with open(self.storage_path, 'r') as f: data = json.load(f)
    from cryptography.hazmat.primitives.asymmetric import ed25519
    pem_lines = data['privateKeyPem'].strip().split('\n')
    b64 = ''.join([l for l in pem_lines if not l.startswith('---')])
    self.priv = ed25519.Ed25519PrivateKey.from_private_bytes(base64.b64decode(b64)[16:])
    self.device_id = data['deviceId']
    from cryptography.hazmat.primitives import serialization
    pub_bytes = self.priv.public_key().public_bytes(
        encoding=serialization.Encoding.Raw, format=serialization.PublicFormat.Raw
    )
    self.pub_b64 = base64.b64encode(pub_bytes).decode('ascii').rstrip('=').replace('+','-').replace('/','_')

def sign(self, p):
    sig = self.priv.sign(p.encode('utf-8'))
    return base64.b64encode(sig).decode('ascii').rstrip('=').replace('+','-').replace('/','_')

class LatencyTester: def init(self): self.device = DeviceIdentity(DEVICE_STORAGE) self.t0_send = 0 self.t1_start = 0 self.t2_first = 0 self.has_sent = False

def start(self):
    log("🚀 Connecting to OpenClaw...")
    self.ws = websocket.WebSocketApp(URL, header={'Authorization': f'Bearer {TOKEN}'},
        on_open=self.on_open, on_message=self.on_message, on_error=self.on_error)
    self.ws.run_forever(sslopt={"cert_reqs": ssl.CERT_NONE})

def on_open(self, ws): log("🔗 Connected")
def on_error(self, ws, e): log(f"❌ Error: {e}")

def send_test_msg(self, ws):
    if self.has_sent: return
    self.has_sent = True

    msg_id = f"m-{int(time.time()*1000)}"
    idem_key = f"s-{int(time.time()*1000)}"

    req = {
        "type": "req",
        "id": msg_id,
        "method": "chat.send",
        "params": {
            "sessionKey": SESSION_KEY,
            "message": "(延迟测试)",
            "deliver": True,
            "idempotencyKey": idem_key
        }
    }

    self.t0_send = time.time()
    log(f"📤 [T0] Sending message (ID: {msg_id})...")
    ws.send(json.dumps(req))

def on_message(self, ws, message):
    try:
        d = json.loads(message)
        ev = d.get('event')
        pl = d.get('payload', {})

        # 1. Authentication
        if ev == 'connect.challenge':
            ts = pl['ts']
            nonce = pl['nonce']
            auth_str = f"v2|{self.device.device_id}|webchat|webchat|operator|operator.admin,operator.write,operator.approvals,operator.pairing,operator.read|{ts}|{TOKEN}|{nonce}"
            sig = self.device.sign(auth_str)
            ws.send(json.dumps({
                "type":"req","id":"auth","method":"connect",
                "params":{
                    "minProtocol":3,"maxProtocol":3,"role":"operator",
                    "scopes":["operator.admin","operator.write","operator.approvals","operator.pairing","operator.read"],
                    "auth":{"token":TOKEN},
                    "device":{"id":self.device.device_id,"publicKey":self.device.pub_b64,"signature":sig,"signedAt":ts,"nonce":nonce},
                    "client":{"id":"webchat","version":"1.0","platform":"linux","mode":"webchat"}
                }
            }))

        elif d.get('id') == 'auth':
            if d.get('ok'):
                log("✅ Authentication successful, sending test message...")
                self.send_test_msg(ws)
            else:
                log(f"❌ Authentication failed: {d.get('error')}")

        # 2. Capture timing
        elif ev == 'agent':
            stream = pl.get('stream')
            phase = pl.get('data', {}).get('phase')

            # [T1] Agent starts
            if stream == 'lifecycle' and phase == 'start':
                self.t1_start = time.time()
                diff = int((self.t1_start - self.t0_send) * 1000)
                log(f"⏱️ [T1] Agent started (Phase: Start): +{diff} ms")

            # [T2] First token generated
            elif stream == 'assistant':
                if 'text' in pl.get('data', {}) or 'delta' in pl.get('data', {}):
                    if self.t2_first == 0:
                        self.t2_first = time.time()
                        self.print_report()
                        ws.close()

    except Exception as e:
        log(f"Error processing message: {e}")

def print_report(self):
    t0 = self.t0_send
    t1 = self.t1_start
    t2 = self.t2_first
    if t1 == 0: t1 = t2

    p1 = int((t1 - t0) * 1000)
    p2 = int((t2 - t1) * 1000)
    total = int((t2 - t0) * 1000)

    print("\n" + "="*40)
    print("📊 Latency Report")
    print("="*40)
    print(f"1️⃣ Phase 1 (Scheduling/RAG/History): {p1} ms")
    print(f"   (If high, check knowledge base and disk)")
    print(f"2️⃣ Phase 2 (LLM Generation):         {p2} ms")
    print(f"   (If high, check model speed and tools)")
    print("-" * 40)
    print(f"🏁 Total TTFT:                       {total} ms")
    print("="*40 + "\n")

if name == "main": LatencyTester().start()

Root Cause

3. Code investigation

  • loadSessionStore() with skipCache: true reads and does structuredClone() on the entire session store on every message. Changed to skipCache: false — marginal improvement but did not solve the root cause.
  • getReplyFromConfig() timing was instrumented — the ~8-12s delay occurs before Pi runtime is invoked.
  • The delay appears to be in the preprocessing pipeline (pre-LLM), not in the model inference itself.

Fix Action

Fix / Workaround

2. Config patch attempt (failed)

Tried reducing contextTokens to 32k — no improvement.

Code Example

02:52:25.731+08:00 session-init store-load agent=main session=agent:main:main elapsedMs=2
02:52:25.764+08:00 [model-selection] session=agent:main:main stage=configured-allowlist-built elapsedMs=2
02:52:25.765+08:00 [model-selection] session=agent:main:main stage=auth-profile-store-loaded elapsedMs=3
(T1 fires at +572ms — fast first message)
...
02:52:39.113+08:00 session-init store-load agent=main session=agent:main:main elapsedMs=2
02:52:39.146+08:00 [model-selection] stage=configured-allowlist-built elapsedMs=1
(T1 fires at +11303ms — slow second message)

---

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import os, time, json, ssl, base64, hashlib, threading
import websocket
import logging

# ========== Configuration ==========
URL = "ws://127.0.0.1:18789/ws"
TOKEN = "123456"
SESSION_KEY = "agent:main:main"
DEVICE_STORAGE = os.path.expanduser("~/.openclaw/workspace/.xiaomeng_correct_device.json")
# ==================================

# Logging format
logging.basicConfig(format='%(asctime)s.%(msecs)03d %(message)s', datefmt='%H:%M:%S', level=logging.INFO)
log = logging.info

class DeviceIdentity:
    def __init__(self, storage_path):
        self.storage_path = storage_path
        self.load()

    def load(self):
        if not os.path.exists(self.storage_path):
            log(f"❌ Error: device file not found {self.storage_path}")
            os._exit(1)
        with open(self.storage_path, 'r') as f: data = json.load(f)
        from cryptography.hazmat.primitives.asymmetric import ed25519
        pem_lines = data['privateKeyPem'].strip().split('\n')
        b64 = ''.join([l for l in pem_lines if not l.startswith('---')])
        self.priv = ed25519.Ed25519PrivateKey.from_private_bytes(base64.b64decode(b64)[16:])
        self.device_id = data['deviceId']
        from cryptography.hazmat.primitives import serialization
        pub_bytes = self.priv.public_key().public_bytes(
            encoding=serialization.Encoding.Raw, format=serialization.PublicFormat.Raw
        )
        self.pub_b64 = base64.b64encode(pub_bytes).decode('ascii').rstrip('=').replace('+','-').replace('/','_')

    def sign(self, p):
        sig = self.priv.sign(p.encode('utf-8'))
        return base64.b64encode(sig).decode('ascii').rstrip('=').replace('+','-').replace('/','_')

class LatencyTester:
    def __init__(self):
        self.device = DeviceIdentity(DEVICE_STORAGE)
        self.t0_send = 0
        self.t1_start = 0
        self.t2_first = 0
        self.has_sent = False

    def start(self):
        log("🚀 Connecting to OpenClaw...")
        self.ws = websocket.WebSocketApp(URL, header={'Authorization': f'Bearer {TOKEN}'},
            on_open=self.on_open, on_message=self.on_message, on_error=self.on_error)
        self.ws.run_forever(sslopt={"cert_reqs": ssl.CERT_NONE})

    def on_open(self, ws): log("🔗 Connected")
    def on_error(self, ws, e): log(f"❌ Error: {e}")

    def send_test_msg(self, ws):
        if self.has_sent: return
        self.has_sent = True

        msg_id = f"m-{int(time.time()*1000)}"
        idem_key = f"s-{int(time.time()*1000)}"

        req = {
            "type": "req",
            "id": msg_id,
            "method": "chat.send",
            "params": {
                "sessionKey": SESSION_KEY,
                "message": "(延迟测试)",
                "deliver": True,
                "idempotencyKey": idem_key
            }
        }

        self.t0_send = time.time()
        log(f"📤 [T0] Sending message (ID: {msg_id})...")
        ws.send(json.dumps(req))

    def on_message(self, ws, message):
        try:
            d = json.loads(message)
            ev = d.get('event')
            pl = d.get('payload', {})

            # 1. Authentication
            if ev == 'connect.challenge':
                ts = pl['ts']
                nonce = pl['nonce']
                auth_str = f"v2|{self.device.device_id}|webchat|webchat|operator|operator.admin,operator.write,operator.approvals,operator.pairing,operator.read|{ts}|{TOKEN}|{nonce}"
                sig = self.device.sign(auth_str)
                ws.send(json.dumps({
                    "type":"req","id":"auth","method":"connect",
                    "params":{
                        "minProtocol":3,"maxProtocol":3,"role":"operator",
                        "scopes":["operator.admin","operator.write","operator.approvals","operator.pairing","operator.read"],
                        "auth":{"token":TOKEN},
                        "device":{"id":self.device.device_id,"publicKey":self.device.pub_b64,"signature":sig,"signedAt":ts,"nonce":nonce},
                        "client":{"id":"webchat","version":"1.0","platform":"linux","mode":"webchat"}
                    }
                }))

            elif d.get('id') == 'auth':
                if d.get('ok'):
                    log("✅ Authentication successful, sending test message...")
                    self.send_test_msg(ws)
                else:
                    log(f"❌ Authentication failed: {d.get('error')}")

            # 2. Capture timing
            elif ev == 'agent':
                stream = pl.get('stream')
                phase = pl.get('data', {}).get('phase')

                # [T1] Agent starts
                if stream == 'lifecycle' and phase == 'start':
                    self.t1_start = time.time()
                    diff = int((self.t1_start - self.t0_send) * 1000)
                    log(f"⏱️ [T1] Agent started (Phase: Start): +{diff} ms")

                # [T2] First token generated
                elif stream == 'assistant':
                    if 'text' in pl.get('data', {}) or 'delta' in pl.get('data', {}):
                        if self.t2_first == 0:
                            self.t2_first = time.time()
                            self.print_report()
                            ws.close()

        except Exception as e:
            log(f"Error processing message: {e}")

    def print_report(self):
        t0 = self.t0_send
        t1 = self.t1_start
        t2 = self.t2_first
        if t1 == 0: t1 = t2

        p1 = int((t1 - t0) * 1000)
        p2 = int((t2 - t1) * 1000)
        total = int((t2 - t0) * 1000)

        print("\n" + "="*40)
        print("📊 Latency Report")
        print("="*40)
        print(f"1️⃣ Phase 1 (Scheduling/RAG/History): {p1} ms")
        print(f"   (If high, check knowledge base and disk)")
        print(f"2️⃣ Phase 2 (LLM Generation):         {p2} ms")
        print(f"   (If high, check model speed and tools)")
        print("-" * 40)
        print(f"🏁 Total TTFT:                       {total} ms")
        print("="*40 + "\n")

if __name__ == "__main__":
    LatencyTester().start()
RAW_BUFFERClick to expand / collapse

Bug Description

After upgrading past v2026.2.26, webchat messages suffer from 5-12 seconds of pre-processing latency before the LLM starts generating tokens. The first message after gateway restart is fast (~500ms-1.5s), but subsequent messages consistently show 8-12 seconds of delay.

Steps to Reproduce

  1. Start/restart the OpenClaw gateway
  2. Send a message via webchat — note the low latency
  3. Send another message without restarting — observe 8-12s delay
  4. The delay accumulates with each message exchange

Latency Test Results

Run 1 — First message after restart:

  • Phase 1 (preprocessing/RAG/history): ~1.5s
  • Phase 2 (LLM generation): ~20s
  • Total TTFT: ~21s

Run 2 — Second message (no restart):

  • Phase 1: ~7.9s
  • Phase 2: ~14s
  • Total TTFT: ~22s

Run 3 — Third message:

  • Phase 1: ~11.4s
  • Phase 2: ~12s
  • Total TTFT: ~23s

The delay increases with each message. Session info at the time:

  • contextTokens: 200000 (max)
  • inputTokens: ~43000 per message
  • Session store file size: 158KB for agent:main:main

Environment

  • OpenClaw: v2026.4.2 (also reproduced on v2026.2.26+)
  • Platform: WSL2 (Windows), Node.js 22.22.0
  • Model: MiniMax-M2.7
  • Session key: agent:main:main
  • WebSocket connection to ws://127.0.0.1:18789/ws

Troubleshooting Attempted

1. Session store IO analysis

Timing logs (OPENCLAW_DEBUG_INGRESS_TIMING=1) showed session store load is fast (~2-8ms). The bottleneck is NOT in session file reading.

2. Config patch attempt (failed)

Tried reducing contextTokens to 32k — no improvement.

3. Code investigation

  • loadSessionStore() with skipCache: true reads and does structuredClone() on the entire session store on every message. Changed to skipCache: false — marginal improvement but did not solve the root cause.
  • getReplyFromConfig() timing was instrumented — the ~8-12s delay occurs before Pi runtime is invoked.
  • The delay appears to be in the preprocessing pipeline (pre-LLM), not in the model inference itself.

4. Issue is NOT in v2026.2.26

User confirmed that v2026.2.26 does not have this problem. The regression was introduced in a subsequent version update.

5. Not related to humanDelay

The humanDelay mechanism (typing simulation delay) was confirmed to only affect outgoing message delivery, not the pre-processing phase.

Relevant Log Excerpt (debug timing enabled)

02:52:25.731+08:00 session-init store-load agent=main session=agent:main:main elapsedMs=2
02:52:25.764+08:00 [model-selection] session=agent:main:main stage=configured-allowlist-built elapsedMs=2
02:52:25.765+08:00 [model-selection] session=agent:main:main stage=auth-profile-store-loaded elapsedMs=3
(T1 fires at +572ms — fast first message)
...
02:52:39.113+08:00 session-init store-load agent=main session=agent:main:main elapsedMs=2
02:52:39.146+08:00 [model-selection] stage=configured-allowlist-built elapsedMs=1
(T1 fires at +11303ms — slow second message)

Test Script

latency_test.py — a Python WebSocket latency test tool using device identity authentication:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import os, time, json, ssl, base64, hashlib, threading
import websocket
import logging

# ========== Configuration ==========
URL = "ws://127.0.0.1:18789/ws"
TOKEN = "123456"
SESSION_KEY = "agent:main:main"
DEVICE_STORAGE = os.path.expanduser("~/.openclaw/workspace/.xiaomeng_correct_device.json")
# ==================================

# Logging format
logging.basicConfig(format='%(asctime)s.%(msecs)03d %(message)s', datefmt='%H:%M:%S', level=logging.INFO)
log = logging.info

class DeviceIdentity:
    def __init__(self, storage_path):
        self.storage_path = storage_path
        self.load()

    def load(self):
        if not os.path.exists(self.storage_path):
            log(f"❌ Error: device file not found {self.storage_path}")
            os._exit(1)
        with open(self.storage_path, 'r') as f: data = json.load(f)
        from cryptography.hazmat.primitives.asymmetric import ed25519
        pem_lines = data['privateKeyPem'].strip().split('\n')
        b64 = ''.join([l for l in pem_lines if not l.startswith('---')])
        self.priv = ed25519.Ed25519PrivateKey.from_private_bytes(base64.b64decode(b64)[16:])
        self.device_id = data['deviceId']
        from cryptography.hazmat.primitives import serialization
        pub_bytes = self.priv.public_key().public_bytes(
            encoding=serialization.Encoding.Raw, format=serialization.PublicFormat.Raw
        )
        self.pub_b64 = base64.b64encode(pub_bytes).decode('ascii').rstrip('=').replace('+','-').replace('/','_')

    def sign(self, p):
        sig = self.priv.sign(p.encode('utf-8'))
        return base64.b64encode(sig).decode('ascii').rstrip('=').replace('+','-').replace('/','_')

class LatencyTester:
    def __init__(self):
        self.device = DeviceIdentity(DEVICE_STORAGE)
        self.t0_send = 0
        self.t1_start = 0
        self.t2_first = 0
        self.has_sent = False

    def start(self):
        log("🚀 Connecting to OpenClaw...")
        self.ws = websocket.WebSocketApp(URL, header={'Authorization': f'Bearer {TOKEN}'},
            on_open=self.on_open, on_message=self.on_message, on_error=self.on_error)
        self.ws.run_forever(sslopt={"cert_reqs": ssl.CERT_NONE})

    def on_open(self, ws): log("🔗 Connected")
    def on_error(self, ws, e): log(f"❌ Error: {e}")

    def send_test_msg(self, ws):
        if self.has_sent: return
        self.has_sent = True

        msg_id = f"m-{int(time.time()*1000)}"
        idem_key = f"s-{int(time.time()*1000)}"

        req = {
            "type": "req",
            "id": msg_id,
            "method": "chat.send",
            "params": {
                "sessionKey": SESSION_KEY,
                "message": "(延迟测试)",
                "deliver": True,
                "idempotencyKey": idem_key
            }
        }

        self.t0_send = time.time()
        log(f"📤 [T0] Sending message (ID: {msg_id})...")
        ws.send(json.dumps(req))

    def on_message(self, ws, message):
        try:
            d = json.loads(message)
            ev = d.get('event')
            pl = d.get('payload', {})

            # 1. Authentication
            if ev == 'connect.challenge':
                ts = pl['ts']
                nonce = pl['nonce']
                auth_str = f"v2|{self.device.device_id}|webchat|webchat|operator|operator.admin,operator.write,operator.approvals,operator.pairing,operator.read|{ts}|{TOKEN}|{nonce}"
                sig = self.device.sign(auth_str)
                ws.send(json.dumps({
                    "type":"req","id":"auth","method":"connect",
                    "params":{
                        "minProtocol":3,"maxProtocol":3,"role":"operator",
                        "scopes":["operator.admin","operator.write","operator.approvals","operator.pairing","operator.read"],
                        "auth":{"token":TOKEN},
                        "device":{"id":self.device.device_id,"publicKey":self.device.pub_b64,"signature":sig,"signedAt":ts,"nonce":nonce},
                        "client":{"id":"webchat","version":"1.0","platform":"linux","mode":"webchat"}
                    }
                }))

            elif d.get('id') == 'auth':
                if d.get('ok'):
                    log("✅ Authentication successful, sending test message...")
                    self.send_test_msg(ws)
                else:
                    log(f"❌ Authentication failed: {d.get('error')}")

            # 2. Capture timing
            elif ev == 'agent':
                stream = pl.get('stream')
                phase = pl.get('data', {}).get('phase')

                # [T1] Agent starts
                if stream == 'lifecycle' and phase == 'start':
                    self.t1_start = time.time()
                    diff = int((self.t1_start - self.t0_send) * 1000)
                    log(f"⏱️ [T1] Agent started (Phase: Start): +{diff} ms")

                # [T2] First token generated
                elif stream == 'assistant':
                    if 'text' in pl.get('data', {}) or 'delta' in pl.get('data', {}):
                        if self.t2_first == 0:
                            self.t2_first = time.time()
                            self.print_report()
                            ws.close()

        except Exception as e:
            log(f"Error processing message: {e}")

    def print_report(self):
        t0 = self.t0_send
        t1 = self.t1_start
        t2 = self.t2_first
        if t1 == 0: t1 = t2

        p1 = int((t1 - t0) * 1000)
        p2 = int((t2 - t1) * 1000)
        total = int((t2 - t0) * 1000)

        print("\n" + "="*40)
        print("📊 Latency Report")
        print("="*40)
        print(f"1️⃣ Phase 1 (Scheduling/RAG/History): {p1} ms")
        print(f"   (If high, check knowledge base and disk)")
        print(f"2️⃣ Phase 2 (LLM Generation):         {p2} ms")
        print(f"   (If high, check model speed and tools)")
        print("-" * 40)
        print(f"🏁 Total TTFT:                       {total} ms")
        print("="*40 + "\n")

if __name__ == "__main__":
    LatencyTester().start()

Questions for Maintainers

  1. What changed in the preprocessing pipeline between v2026.2.26 and later versions that could cause this cumulative delay?
  2. Is there a known memory/GC issue or session state growth problem that could explain the increasing delay pattern?
  3. Are there any session compaction or context pruning settings that could help reduce input token load per message?

extent analysis

TL;DR

The most likely fix for the cumulative delay issue in the preprocessing pipeline is to identify and address the root cause of the session state growth problem, potentially by implementing session compaction or context pruning settings.

Guidance

  • Investigate the changes made to the preprocessing pipeline between v2026.2.26 and later versions to determine the cause of the cumulative delay.
  • Check for any memory or garbage collection issues that could be contributing to the increasing delay pattern.
  • Explore the possibility of implementing session compaction or context pruning settings to reduce the input token load per message.
  • Review the loadSessionStore() function with skipCache: true to ensure it is not causing unnecessary delays.

Example

No specific code example is provided, as the issue requires further investigation into the preprocessing pipeline and session state management.

Notes

The provided information suggests that the issue is related to the preprocessing pipeline and session state growth, but further investigation is needed to determine the root cause. The latency_test.py script can be used to reproduce the issue and gather more information.

Recommendation

Apply a workaround by implementing session compaction or context pruning settings to reduce the input token load per message, while investigating the root cause of the issue. This may help alleviate the cumulative delay problem until a permanent fix can be implemented.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: High pre-processing latency (8-12s) on webchat after gateway restart, not in v2026.2.26 [1 participants]