openclaw - 💡(How to fix) Fix Feature: Lightweight LLM passthrough mode for /v1/chat/completions — skip session persistence entirely [1 participants]

Root Cause

OpenClaw has the best multi-provider model configuration of any open-source AI gateway. The ability to define providers with fallbacks, cost tracking, and unified auth is exactly what teams running multiple LLMs need. But the mandatory session overhead makes it unusable as a lightweight proxy — forcing users to either:

Bypass OpenClaw entirely (losing model routing/fallback/auth), or
Accept unbounded disk growth and periodic manual cleanup

A passthrough mode would make OpenClaw competitive with LiteLLM/OpenRouter for pure API relay use cases, while keeping the full agent pipeline available for interactive conversations.

Code Example

# Active sessions per agent (April 7, 2026)
qwen35local:  200 sessions,  47MB    ← API relay agent
qwen35plus:     0 sessions, 844MB    ← deleted files not reclaimed  
main:          29 sessions,  21MB
gemma4:       104 sessions, 1.2MB

---

{"type":"session","version":3,"id":"c213b0f2-...","timestamp":"2026-04-07T10:26:50.619Z","cwd":"/home/node/.openclaw/workspace-api"}
{"type":"model_change",...}
{"type":"thinking_level_change",...}
{"type":"custom","customType":"model-snapshot",...}
{"type":"message","message":{"role":"user","content":[{"type":"text","text":"说一个数字"}]}}
{"type":"message","message":{"role":"assistant","content":[{"type":"text","text":"42"}],...}}

---

{
  gateway: {
    http: {
      endpoints: {
        chatCompletions: {
          enabled: true,
          mode: "passthrough"  // skip session, tools, skills — pure LLM relay
        }
      }
    }
  }
}

---

{
  agents: {
    list: [{
      id: "relay",
      model: "local-llm/qwen35-27b",
      session: { ephemeral: true },  // in-memory only, no disk writes
      tools: { deny: ["*"] },
      skills: []
    }]
  }
}

Summary

When using OpenClaw as a unified LLM gateway (model routing + fallback across multiple providers), the /v1/chat/completions endpoint unconditionally creates session files on every request — even when the caller explicitly wants stateless, one-shot LLM calls. This makes OpenClaw unsuitable as a lightweight LLM proxy despite having excellent multi-provider model configuration.

Use Case

We run 10+ LLM models through OpenClaw (local Qwen3.5-27B, DashScope Qwen3.5-Plus/Qwen3-Max/GLM-5/Kimi-K2.5, local Gemma4, etc.) with tools.profile: "minimal" and skills: []. The only value we want from the gateway is:

Unified endpoint — one URL, model routing via model field
Fallback — if local LLM is down, auto-failover to cloud
Auth — single bearer token for all backends

We do not want: session persistence, transcript storage, skill snapshots, memory integration, or any stateful behavior.

Concrete scenario: real-time trading signal agent

Our signal agent processes 600+ events/day via WebSocket, calling the LLM for each event. Each call is independent (full context in the request body, no conversation history needed). Currently we bypass OpenClaw and call the LLM directly because:

Each /v1/chat/completions call creates a .jsonl session file (~3-22KB)
Each session entry in sessions.json includes skillsSnapshot (~41KB per entry, as documented in #55334)
600 calls/day × 22KB = 13MB/day of useless session files
After 3 days, one of our agents accumulated 200 sessions / 47MB, another has 844MB in deleted session files

Evidence from production

# Active sessions per agent (April 7, 2026)
qwen35local:  200 sessions,  47MB    ← API relay agent
qwen35plus:     0 sessions, 844MB    ← deleted files not reclaimed  
main:          29 sessions,  21MB
gemma4:       104 sessions, 1.2MB

Even a single test call (curl /v1/chat/completions with no user field, no x-openclaw-session-key) creates a full session:

{"type":"session","version":3,"id":"c213b0f2-...","timestamp":"2026-04-07T10:26:50.619Z","cwd":"/home/node/.openclaw/workspace-api"}
{"type":"model_change",...}
{"type":"thinking_level_change",...}
{"type":"custom","customType":"model-snapshot",...}
{"type":"message","message":{"role":"user","content":[{"type":"text","text":"说一个数字"}]}}
{"type":"message","message":{"role":"assistant","content":[{"type":"text","text":"42"}],...}}

6 lines of JSONL for a "stateless" request returning "42".

Documentation vs Reality

The docs state: "By default the endpoint is stateless per request (a new session key is generated each call)."

This is misleading. "Stateless" means the next request won't reuse the session — but the session is still created, written to disk, and persisted indefinitely. True stateless would mean no disk I/O at all.

Proposed Solution

Option A: `passthrough` mode (preferred)

A gateway-level or per-agent config that skips the entire agent pipeline for HTTP API calls:

{
  gateway: {
    http: {
      endpoints: {
        chatCompletions: {
          enabled: true,
          mode: "passthrough"  // skip session, tools, skills — pure LLM relay
        }
      }
    }
  }
}

In passthrough mode:

Request goes directly to the configured model provider (respecting model routing + fallback)
No session file created
No sessions.json entry
No skill snapshot
No system prompt injection
No tool availability
Just: authenticate → resolve model → forward to provider → return response

Option B: Per-agent `ephemeral` flag (as proposed in #48159)

{
  agents: {
    list: [{
      id: "relay",
      model: "local-llm/qwen35-27b",
      session: { ephemeral: true },  // in-memory only, no disk writes
      tools: { deny: ["*"] },
      skills: []
    }]
  }
}

Option C: Minimum viable — skip session for anonymous HTTP requests

If the request has no user field AND no x-openclaw-session-key header, don't create any session file. The session exists only in memory for the duration of the request and is discarded immediately.

Why This Matters

Bypass OpenClaw entirely (losing model routing/fallback/auth), or
Accept unbounded disk growth and periodic manual cleanup

A passthrough mode would make OpenClaw competitive with LiteLLM/OpenRouter for pure API relay use cases, while keeping the full agent pipeline available for interactive conversations.

Related Issues

#48159 — Ephemeral session mode (no persistence) — proposed but not implemented
#55334 — sessions.json OOM from skillsSnapshot duplication per session
#55768 — Health check pings create full sessions (96/day = 2MB/day)
#60847 — Completed sessions accumulate indefinitely, no auto-cleanup
#20934 — No REST endpoint for session management (can't even delete sessions via API)

extent analysis

TL;DR

To fix the issue of OpenClaw creating unnecessary session files for stateless LLM calls, implement a "passthrough" mode that skips session creation for HTTP API calls.

Guidance

Implement passthrough mode: Add a configuration option to enable passthrough mode for the /v1/chat/completions endpoint, which skips the entire agent pipeline and directly forwards requests to the configured model provider.
Verify passthrough mode: Test the passthrough mode by making stateless LLM calls and verifying that no session files are created.
Consider ephemeral flag: Alternatively, consider implementing an ephemeral flag for per-agent configurations, which would allow sessions to exist only in memory and not be written to disk.
Review related issues: Investigate related issues, such as #48159, #55334, and #60847, to ensure that the proposed solution addresses all aspects of the problem.

Example

{
  gateway: {
    http: {
      endpoints: {
        chatCompletions: {
          enabled: true,
          mode: "passthrough"  // skip session, tools, skills — pure LLM relay
        }
      }
    }
  }
}

Notes

The proposed solution focuses on implementing a passthrough mode, which would address the primary issue of unnecessary session file creation. However, it is essential to review related issues to ensure that all aspects of the problem are addressed.

Recommendation

Apply the workaround by implementing the passthrough mode, as it directly addresses the issue of unnecessary session file creation and provides a lightweight proxy solution for stateless LLM calls.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix Feature: Lightweight LLM passthrough mode for /v1/chat/completions — skip session persistence entirely [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Summary

Use Case

Concrete scenario: real-time trading signal agent

Evidence from production

Documentation vs Reality

Proposed Solution

Option A: `passthrough` mode (preferred)

Option B: Per-agent `ephemeral` flag (as proposed in #48159)

Option C: Minimum viable — skip session for anonymous HTTP requests

Why This Matters

Related Issues

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix Feature: Lightweight LLM passthrough mode for /v1/chat/completions — skip session persistence entirely [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Summary

Use Case

Concrete scenario: real-time trading signal agent

Evidence from production

Documentation vs Reality

Proposed Solution

Option A: passthrough mode (preferred)

Option B: Per-agent ephemeral flag (as proposed in #48159)

Option C: Minimum viable — skip session for anonymous HTTP requests

Why This Matters

Related Issues

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING

Option A: `passthrough` mode (preferred)

Option B: Per-agent `ephemeral` flag (as proposed in #48159)