openclaw - ✅(Solved) Fix HTTP /v1/chat/completions: 10-15s TTFB due to full agent context assembly — needs lightContext/voice mode [1 pull requests, 1 comments, 2 participants]

openclaw2026-04-19 10:42:27

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#68920•Fetched 2026-04-20 12:04:26

View on GitHub

Comments

Participants

Timeline

Reactions

Author

mehdic

Participants

koutsenko

mehdic

Timeline (top)

commented ×1cross-referenced ×1referenced ×1

Root Cause

Every HTTP completions request goes through the full agentCommandFromIngress() pipeline, which:

Assembles the complete system prompt (~15-20K tokens) including:
- Full tool definitions and descriptions
- Skills catalog
- Memory recall instructions
- Workspace bootstrap files (SOUL.md, IDENTITY.md, AGENTS.md, etc.)
- Runtime metadata
Triggers memorySearch.sync.onSessionStart (QMD re-index) on every call since each gets a unique session key (openai:<uuid>)
Runs bootstrap hooks, heartbeat resolution, etc.

Fix Action

Workaround

Setting memorySearch.sync.onSessionStart: false helps with the QMD re-index on new sessions, but the core issue (15-20K tokens of framework context) remains.

PR fix notes

PR #69060: fix: add x-openclaw-session header for stable agent-scoped HTTP sessions

Repository: openclaw/openclaw
Author: JustInCache
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/69060

Description (problem / solution / changelog)

Summary

Partially addresses #68920 (the QMD re-index overhead component).

Every /v1/chat/completions call that omits the OpenAI user field receives a fresh random-UUID session key. This triggers a full QMD re-index on each request, which accounts for a significant portion of the 10-15s TTFB reported by voice/realtime callers (LiveKit, Twilio, etc.).

Voice stacks cannot always inject the user field per-request. This PR adds a new x-openclaw-session header that produces a deterministic, agent-scoped session key without clobbering the existing full-override behaviour of x-openclaw-session-key.

Priority order (unchanged except for the new entry)

Priority	Source	Scope
1	`x-openclaw-session-key` header	Explicit full override (unchanged)
2	`user` request field	Agent-scoped (unchanged)
3	`x-openclaw-session` header (new)	Agent-scoped stable hint
4	Random UUID	Per-call fallback (unchanged)

Usage

Voice/realtime callers can now pass a stable call identifier:

POST /v1/chat/completions
x-openclaw-session: livekit-room-abc123

The gateway resolves this to {prefix}-session:livekit-room-abc123 scoped to the agent, so warm-session cache hits apply and QMD sync is skipped on subsequent requests in the same call.

What this does NOT fix

The core 15-20K-token framework system prompt assembly still runs on every request. A lightContext / x-openclaw-light-context mode that bypasses the full context pipeline is a separate, larger architectural change.

Test plan

Without x-openclaw-session: behaviour unchanged — new UUID session per call
With x-openclaw-session: my-id: same stable session key across calls (verify no QMD re-index on second call)
user field still takes priority over x-openclaw-session
x-openclaw-session-key still takes highest priority, overriding both
Two different agents with the same x-openclaw-session value get different session keys (agent scoping confirmed)

Changed files

src/gateway/http-utils.ts (modified, +23/-2)

RAW_BUFFERClick to expand / collapse

Problem

The HTTP /v1/chat/completions endpoint takes 10-15 seconds TTFB for a simple "say hi" prompt, making it unusable for real-time voice agents (LiveKit, Twilio, etc.).

Direct OpenAI API calls return in ~400ms for the same prompt.

Root Cause

Every HTTP completions request goes through the full agentCommandFromIngress() pipeline, which:

Assembles the complete system prompt (~15-20K tokens) including:
- Full tool definitions and descriptions
- Skills catalog
- Memory recall instructions
- Workspace bootstrap files (SOUL.md, IDENTITY.md, AGENTS.md, etc.)
- Runtime metadata
Triggers memorySearch.sync.onSessionStart (QMD re-index) on every call since each gets a unique session key (openai:<uuid>)
Runs bootstrap hooks, heartbeat resolution, etc.

Evidence

Config	Workspace Size	Prompt Tokens	TTFB
Full agent (Cassius)	42KB	18,718	10.8s
Mini agent (MiniSevro)	20KB	15,387	6.7s
Bare minimum (2-line SOUL, `tools.profile: minimal`, `skipBootstrap: true`, all sync disabled)	~200 bytes	19,713	9.5s
Direct OpenAI (same model)	N/A	~20	0.4s

The bare minimum agent still has 19,713 prompt tokens — the OpenClaw framework system prompt dominates regardless of workspace size.

Impact

This makes the HTTP API unusable for:

Real-time voice agents (LiveKit, Twilio Media Streams)
Low-latency chatbots
Any use case requiring sub-2s TTFB

Proposed Solutions

1. `lightContext` mode for HTTP API (highest impact)

Add a header like x-openclaw-light-context: true that:

Skips tool definitions injection
Skips skills catalog
Skips memory recall instructions
Only includes SOUL.md/IDENTITY.md (personality) or a custom slim system prompt
Skips QMD sync entirely

2. Stable session keys for HTTP API

Instead of openai:<random-uuid> per call, allow a x-openclaw-session header to reuse a session key, so sessionWarm cache hits and avoids QMD re-index on every call.

3. Per-agent system prompt override for HTTP

Allow agents to define a http.systemPrompt that replaces the full framework prompt when calls come via HTTP API.

Workaround

Setting memorySearch.sync.onSessionStart: false helps with the QMD re-index on new sessions, but the core issue (15-20K tokens of framework context) remains.

Environment

OpenClaw version: latest (npm global)
macOS arm64
LLM: openai/gpt-5.4-mini via gateway
Voice stack: LiveKit + faster-whisper STT + ElevenLabs TTS

extent analysis

TL;DR

Implementing a lightContext mode for the HTTP API by adding a header like x-openclaw-light-context: true could significantly reduce the Time To First Byte (TTFB) by skipping unnecessary data injections and operations.

Guidance

Verify the impact of the lightContext mode: Test the proposed lightContext mode with the x-openclaw-light-context: true header to measure its effect on TTFB.
Evaluate stable session keys: Consider implementing stable session keys via the x-openclaw-session header to reuse session keys and potentially reduce QMD re-indexing overhead.
Assess per-agent system prompt overrides: Explore allowing agents to define a custom http.systemPrompt to replace the full framework prompt for HTTP API calls, which might further optimize performance.
Review the effectiveness of the memorySearch.sync.onSessionStart: false workaround: While this setting helps with QMD re-indexing, it does not address the core issue of large framework context; thus, its long-term utility is limited.

Example

No specific code example is provided due to the lack of explicit code snippets in the issue description. However, implementing the lightContext mode could involve conditional logic based on the presence of the x-openclaw-light-context header to skip certain operations.

Notes

The proposed solutions aim to address the high TTFB issue by reducing the amount of data processed and the number of operations performed for each HTTP request. The effectiveness of these solutions may vary depending on the specific use case and the underlying infrastructure.

Recommendation

Apply the lightContext mode workaround by implementing the x-openclaw-light-context: true header, as it directly addresses the core issue of excessive framework context and has the potential to significantly reduce TTFB.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #task chaining #parallel task #integration issue #index setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix HTTP /v1/chat/completions: 10-15s TTFB due to full agent context assembly — needs lightContext/voice mode [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Workaround

PR fix notes

PR #69060: fix: add x-openclaw-session header for stable agent-scoped HTTP sessions

Description (problem / solution / changelog)

Summary

Priority order (unchanged except for the new entry)

Usage

What this does NOT fix

Test plan

Changed files

Problem

Root Cause

Evidence

Impact

Proposed Solutions

1. lightContext mode for HTTP API (highest impact)

2. Stable session keys for HTTP API

3. Per-agent system prompt override for HTTP

Workaround

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING

1. `lightContext` mode for HTTP API (highest impact)