openclaw - 💡(How to fix) Fix Feature: Lightweight LLM passthrough mode for /v1/chat/completions — skip session persistence entirely [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#62436Fetched 2026-04-08 03:04:18
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Author
Participants

When using OpenClaw as a unified LLM gateway (model routing + fallback across multiple providers), the /v1/chat/completions endpoint unconditionally creates session files on every request — even when the caller explicitly wants stateless, one-shot LLM calls. This makes OpenClaw unsuitable as a lightweight LLM proxy despite having excellent multi-provider model configuration.

Root Cause

OpenClaw has the best multi-provider model configuration of any open-source AI gateway. The ability to define providers with fallbacks, cost tracking, and unified auth is exactly what teams running multiple LLMs need. But the mandatory session overhead makes it unusable as a lightweight proxy — forcing users to either:

  1. Bypass OpenClaw entirely (losing model routing/fallback/auth), or
  2. Accept unbounded disk growth and periodic manual cleanup

A passthrough mode would make OpenClaw competitive with LiteLLM/OpenRouter for pure API relay use cases, while keeping the full agent pipeline available for interactive conversations.

Code Example

# Active sessions per agent (April 7, 2026)
qwen35local:  200 sessions,  47MB    ← API relay agent
qwen35plus:     0 sessions, 844MB    ← deleted files not reclaimed  
main:          29 sessions,  21MB
gemma4:       104 sessions, 1.2MB

---

{"type":"session","version":3,"id":"c213b0f2-...","timestamp":"2026-04-07T10:26:50.619Z","cwd":"/home/node/.openclaw/workspace-api"}
{"type":"model_change",...}
{"type":"thinking_level_change",...}
{"type":"custom","customType":"model-snapshot",...}
{"type":"message","message":{"role":"user","content":[{"type":"text","text":"说一个数字"}]}}
{"type":"message","message":{"role":"assistant","content":[{"type":"text","text":"42"}],...}}

---

{
  gateway: {
    http: {
      endpoints: {
        chatCompletions: {
          enabled: true,
          mode: "passthrough"  // skip session, tools, skills — pure LLM relay
        }
      }
    }
  }
}

---

{
  agents: {
    list: [{
      id: "relay",
      model: "local-llm/qwen35-27b",
      session: { ephemeral: true },  // in-memory only, no disk writes
      tools: { deny: ["*"] },
      skills: []
    }]
  }
}
RAW_BUFFERClick to expand / collapse

Summary

When using OpenClaw as a unified LLM gateway (model routing + fallback across multiple providers), the /v1/chat/completions endpoint unconditionally creates session files on every request — even when the caller explicitly wants stateless, one-shot LLM calls. This makes OpenClaw unsuitable as a lightweight LLM proxy despite having excellent multi-provider model configuration.

Use Case

We run 10+ LLM models through OpenClaw (local Qwen3.5-27B, DashScope Qwen3.5-Plus/Qwen3-Max/GLM-5/Kimi-K2.5, local Gemma4, etc.) with tools.profile: "minimal" and skills: []. The only value we want from the gateway is:

  1. Unified endpoint — one URL, model routing via model field
  2. Fallback — if local LLM is down, auto-failover to cloud
  3. Auth — single bearer token for all backends

We do not want: session persistence, transcript storage, skill snapshots, memory integration, or any stateful behavior.

Concrete scenario: real-time trading signal agent

Our signal agent processes 600+ events/day via WebSocket, calling the LLM for each event. Each call is independent (full context in the request body, no conversation history needed). Currently we bypass OpenClaw and call the LLM directly because:

  • Each /v1/chat/completions call creates a .jsonl session file (~3-22KB)
  • Each session entry in sessions.json includes skillsSnapshot (~41KB per entry, as documented in #55334)
  • 600 calls/day × 22KB = 13MB/day of useless session files
  • After 3 days, one of our agents accumulated 200 sessions / 47MB, another has 844MB in deleted session files

Evidence from production

# Active sessions per agent (April 7, 2026)
qwen35local:  200 sessions,  47MB    ← API relay agent
qwen35plus:     0 sessions, 844MB    ← deleted files not reclaimed  
main:          29 sessions,  21MB
gemma4:       104 sessions, 1.2MB

Even a single test call (curl /v1/chat/completions with no user field, no x-openclaw-session-key) creates a full session:

{"type":"session","version":3,"id":"c213b0f2-...","timestamp":"2026-04-07T10:26:50.619Z","cwd":"/home/node/.openclaw/workspace-api"}
{"type":"model_change",...}
{"type":"thinking_level_change",...}
{"type":"custom","customType":"model-snapshot",...}
{"type":"message","message":{"role":"user","content":[{"type":"text","text":"说一个数字"}]}}
{"type":"message","message":{"role":"assistant","content":[{"type":"text","text":"42"}],...}}

6 lines of JSONL for a "stateless" request returning "42".

Documentation vs Reality

The docs state: "By default the endpoint is stateless per request (a new session key is generated each call)."

This is misleading. "Stateless" means the next request won't reuse the session — but the session is still created, written to disk, and persisted indefinitely. True stateless would mean no disk I/O at all.

Proposed Solution

Option A: passthrough mode (preferred)

A gateway-level or per-agent config that skips the entire agent pipeline for HTTP API calls:

{
  gateway: {
    http: {
      endpoints: {
        chatCompletions: {
          enabled: true,
          mode: "passthrough"  // skip session, tools, skills — pure LLM relay
        }
      }
    }
  }
}

In passthrough mode:

  • Request goes directly to the configured model provider (respecting model routing + fallback)
  • No session file created
  • No sessions.json entry
  • No skill snapshot
  • No system prompt injection
  • No tool availability
  • Just: authenticate → resolve model → forward to provider → return response

Option B: Per-agent ephemeral flag (as proposed in #48159)

{
  agents: {
    list: [{
      id: "relay",
      model: "local-llm/qwen35-27b",
      session: { ephemeral: true },  // in-memory only, no disk writes
      tools: { deny: ["*"] },
      skills: []
    }]
  }
}

Option C: Minimum viable — skip session for anonymous HTTP requests

If the request has no user field AND no x-openclaw-session-key header, don't create any session file. The session exists only in memory for the duration of the request and is discarded immediately.

Why This Matters

OpenClaw has the best multi-provider model configuration of any open-source AI gateway. The ability to define providers with fallbacks, cost tracking, and unified auth is exactly what teams running multiple LLMs need. But the mandatory session overhead makes it unusable as a lightweight proxy — forcing users to either:

  1. Bypass OpenClaw entirely (losing model routing/fallback/auth), or
  2. Accept unbounded disk growth and periodic manual cleanup

A passthrough mode would make OpenClaw competitive with LiteLLM/OpenRouter for pure API relay use cases, while keeping the full agent pipeline available for interactive conversations.

Related Issues

  • #48159 — Ephemeral session mode (no persistence) — proposed but not implemented
  • #55334 — sessions.json OOM from skillsSnapshot duplication per session
  • #55768 — Health check pings create full sessions (96/day = 2MB/day)
  • #60847 — Completed sessions accumulate indefinitely, no auto-cleanup
  • #20934 — No REST endpoint for session management (can't even delete sessions via API)

extent analysis

TL;DR

To fix the issue of OpenClaw creating unnecessary session files for stateless LLM calls, implement a "passthrough" mode that skips session creation for HTTP API calls.

Guidance

  1. Implement passthrough mode: Add a configuration option to enable passthrough mode for the /v1/chat/completions endpoint, which skips the entire agent pipeline and directly forwards requests to the configured model provider.
  2. Verify passthrough mode: Test the passthrough mode by making stateless LLM calls and verifying that no session files are created.
  3. Consider ephemeral flag: Alternatively, consider implementing an ephemeral flag for per-agent configurations, which would allow sessions to exist only in memory and not be written to disk.
  4. Review related issues: Investigate related issues, such as #48159, #55334, and #60847, to ensure that the proposed solution addresses all aspects of the problem.

Example

{
  gateway: {
    http: {
      endpoints: {
        chatCompletions: {
          enabled: true,
          mode: "passthrough"  // skip session, tools, skills — pure LLM relay
        }
      }
    }
  }
}

Notes

The proposed solution focuses on implementing a passthrough mode, which would address the primary issue of unnecessary session file creation. However, it is essential to review related issues to ensure that all aspects of the problem are addressed.

Recommendation

Apply the workaround by implementing the passthrough mode, as it directly addresses the issue of unnecessary session file creation and provides a lightweight proxy solution for stateless LLM calls.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING