openclaw - ✅(Solved) Fix openai-completions simple sessions log zero token usage despite endpoint returning streaming usage [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#75357Fetched 2026-05-01 05:34:46
View on GitHub
Comments
1
Participants
2
Timeline
5
Reactions
2
Author
Timeline (top)
labeled ×2commented ×1cross-referenced ×1referenced ×1

openai-completions sessions can still write zero token usage to session JSONL for llama.cpp/OpenAI-compatible endpo ints, even though the endpoint returns valid usage when streaming requests include `stream_options.include_usage=true

This appears to be a gap after earlier fixes such as #59328, #68707, #56670, and #49753. Those fixes added usage capture in OpenClaw's custom OpenAI completions transport, but normal no-proxy/no-TLS simple openai-completions runs appear not to route through that usage-aware transport.

Related but broader: #73990.

Root Cause

openai-completions sessions can still write zero token usage to session JSONL for llama.cpp/OpenAI-compatible endpo ints, even though the endpoint returns valid usage when streaming requests include `stream_options.include_usage=true

This appears to be a gap after earlier fixes such as #59328, #68707, #56670, and #49753. Those fixes added usage capture in OpenClaw's custom OpenAI completions transport, but normal no-proxy/no-TLS simple openai-completions runs appear not to route through that usage-aware transport.

Related but broader: #73990.

PR fix notes

PR #16: fix(agents): inject stream_options.include_usage for openai-completions streaming

Description (problem / solution / changelog)

Summary

  • Problem: openai-completions streaming requests omit stream_options.include_usage=true, so most OpenAI-compatible endpoints (llama.cpp, vLLM, LM Studio, etc.) do not return usage data in their final streaming chunk. Sessions record zero tokens even though the same endpoint returns usage for non-streaming requests.
  • Why it matters: Zero token counts break per-session cost tracking, rate-limit guards, and context-window management for all self-hosted / local model users on the completions API.
  • What changed: Added createOpenAICompletionsStreamUsageWrapper in openai-stream-wrappers.ts that intercepts outgoing payloads and injects stream_options: { include_usage: true } whenever stream === true and model.api === "openai-completions". Existing stream_options objects are merged, not replaced. applyExtraParamsToAgent now chains this wrapper unconditionally for every completions agent.
  • What did NOT change: No behavior change for openai-responses, anthropic-messages, Google, or any other API type. No new config knob; the option is universally safe - endpoints that do not support it ignore the field.

Change Type (select all)

  • Bug fix

Scope (select all touched areas)

  • Gateway / orchestration
  • Integrations

Linked Issue/PR

  • Closes #75357

User-visible / Behavior Changes

Sessions using local/self-hosted models via openai-completions (llama.cpp, vLLM, LM Studio, Ollama) will now have accurate token usage recorded per turn.

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No (existing completions requests; one additional field in payload)
  • Command/tool execution surface changed? No
  • Data access scope changed? No

Repro + Verification

Environment

  • OS: Linux
  • Runtime/container: Node 22 / Bun
  • Model/provider: openai-completions (llama.cpp, vLLM, or any OpenAI-compatible endpoint)
  • Integration/channel: any

Steps

  1. Configure an agent with api: openai-completions pointing at llama.cpp or vLLM.
  2. Send a message and watch the outbound streaming payload (via a proxy or debug log).
  3. Before this fix: payload has no stream_options field; endpoint returns usage: null in the final chunk.
  4. After this fix: payload contains "stream_options": {"include_usage": true}; endpoint returns populated usage.

Expected

  • stream_options.include_usage is present in every streaming completions request.
  • Token counts for the session are non-zero after the turn.

Actual (before fix)

  • stream_options absent from payload; token usage stays at zero.

Evidence

  • Failing test/log before + passing after

7 new unit tests in extra-params.openai-completions-stream-usage.test.ts cover:

  • injection when stream: true
  • no injection when stream: false or absent
  • no injection for other API types (openai-responses, anthropic-messages)
  • merging into existing stream_options without clobbering other keys
  • overwrite of a pre-set include_usage: false

Human Verification (required)

  • Verified scenarios: wrapper injects correct field for completions; skips for all other API types; merges cleanly with pre-existing stream_options.
  • Edge cases checked: stream: false, missing stream key, existing stream_options with unrelated keys, pre-set include_usage.
  • What you did not verify: live endpoint round-trip (requires running llama.cpp/vLLM instance).

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No

Failure Recovery (if this breaks)

  • How to disable/revert this change quickly: revert the createOpenAICompletionsStreamUsageWrapper call in applyExtraParamsToAgent.
  • Files/config to restore: src/agents/pi-embedded-runner/extra-params.ts
  • Known bad symptoms: if an endpoint rejects unknown fields, requests may fail - but no known OpenAI-compatible endpoint rejects stream_options.

Risks and Mitigations

  • Risk: an exotic OpenAI-compatible endpoint rejects stream_options when it does not recognise the field.
    • Mitigation: the OpenAI spec documents stream_options as an optional object; compliant endpoints must ignore unknown optional parameters. No known endpoint rejects it.

Generated by Claude Code

<!-- devin-review-badge-begin -->
<a href="https://app.devin.ai/review/suboss87/openclaw/pull/16" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open in Devin Review"> </picture> </a> <!-- devin-review-badge-end -->

Changed files

  • src/agents/pi-embedded-runner/extra-params.openai-completions-stream-usage.test.ts (added, +150/-0)
  • src/agents/pi-embedded-runner/extra-params.ts (modified, +8/-0)
  • src/agents/pi-embedded-runner/openai-stream-wrappers.ts (modified, +37/-0)

Code Example

hi

  4. Inspect the resulting session JSONL assistant message.
  5. Observe that the assistant message usage is recorded as zeros:

     {
       "input": 0,
       "output": 0,
       "cacheRead": 0,
       "cacheWrite": 0,
       "totalTokens": 0
     }

  6. Verify the same endpoint returns usage for a non-streaming request:

     curl http://devbox:8080/v1/chat/completions \
       -H 'Content-Type: application/json' \
       -d '{"model":"qwen36-35b-a3b","messages":[{"role":"user","content":"hi"}],"stream":false}' \
       | python3 -m json.tool | grep -A5 usage

  7. Verify the same endpoint returns usage for a streaming request when stream_options.include_usage=true is included:

     curl -N http://devbox:8080/v1/chat/completions \
       -H 'Content-Type: application/json' \
       -d '{"model":"qwen36-35b-a3b","messages":[{"role":"user","content":"hi"}],"stream":true,"stream_options":
  {"include_usage":true}}' \
       | grep -i '"usage"'

  8. Compare with streaming without stream_options.include_usage=true; usage is not returned:

     curl -N http://devbox:8080/v1/chat/completions \
       -H 'Content-Type: application/json' \
       -d '{"model":"qwen36-35b-a3b","messages":[{"role":"user","content":"hi"}],"stream":true}' \
       | grep -i '"usage"'


### Expected behavior

  OpenClaw should send/request streaming usage for compatible `openai-completions` endpoints and write real usage into
  session JSONL, for example:
  {
    "input": 11,
    "output": 254,
    "cacheRead": 0,
    "cacheWrite": 0,
    "totalTokens": 265
  }


### Actual behavior

  Assistant message records in the session JSONL contain zeros:

  {
    "input": 0,
    "output": 0,
    "cacheRead": 0,
    "cacheWrite": 0,
    "totalTokens": 0
  }

  This causes session-cost tooling to report usage as not reported by the model, even though the model server does
  return usage.


### OpenClaw version

OpenClaw 2026.4.27 (cbc2ba0)

### Operating system

Ubuntu 24.04

### Install method

npm global

### Model

qwen35-35b-a3b

### Provider / routing chain

openclaw -> llama.cpp -> qwen35

### Additional provider/model setup details

_No response_

### Logs, screenshots, and evidence
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

openai-completions sessions can still write zero token usage to session JSONL for llama.cpp/OpenAI-compatible endpo ints, even though the endpoint returns valid usage when streaming requests include `stream_options.include_usage=true

This appears to be a gap after earlier fixes such as #59328, #68707, #56670, and #49753. Those fixes added usage capture in OpenClaw's custom OpenAI completions transport, but normal no-proxy/no-TLS simple openai-completions runs appear not to route through that usage-aware transport.

Related but broader: #73990.

Steps to reproduce

  1. Configure an OpenAI-compatible openai-completions model backed by llama.cpp, without request.proxy or request.tls transport overrides.

  2. Start an OpenClaw session using that model.

  3. Send a normal prompt, for example:

    hi
  4. Inspect the resulting session JSONL assistant message.

  5. Observe that the assistant message usage is recorded as zeros:

    { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0, "totalTokens": 0 }

  6. Verify the same endpoint returns usage for a non-streaming request:

    curl http://devbox:8080/v1/chat/completions
    -H 'Content-Type: application/json'
    -d '{"model":"qwen36-35b-a3b","messages":[{"role":"user","content":"hi"}],"stream":false}'
    | python3 -m json.tool | grep -A5 usage

  7. Verify the same endpoint returns usage for a streaming request when stream_options.include_usage=true is included:

    curl -N http://devbox:8080/v1/chat/completions
    -H 'Content-Type: application/json'
    -d '{"model":"qwen36-35b-a3b","messages":[{"role":"user","content":"hi"}],"stream":true,"stream_options": {"include_usage":true}}'
    | grep -i '"usage"'

  8. Compare with streaming without stream_options.include_usage=true; usage is not returned:

    curl -N http://devbox:8080/v1/chat/completions
    -H 'Content-Type: application/json'
    -d '{"model":"qwen36-35b-a3b","messages":[{"role":"user","content":"hi"}],"stream":true}'
    | grep -i '"usage"'

Expected behavior

OpenClaw should send/request streaming usage for compatible openai-completions endpoints and write real usage into session JSONL, for example: { "input": 11, "output": 254, "cacheRead": 0, "cacheWrite": 0, "totalTokens": 265 }

Actual behavior

Assistant message records in the session JSONL contain zeros:

{ "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0, "totalTokens": 0 }

This causes session-cost tooling to report usage as not reported by the model, even though the model server does return usage.

OpenClaw version

OpenClaw 2026.4.27 (cbc2ba0)

Operating system

Ubuntu 24.04

Install method

npm global

Model

qwen35-35b-a3b

Provider / routing chain

openclaw -> llama.cpp -> qwen35

Additional provider/model setup details

No response

Logs, screenshots, and evidence

The llama.cpp/OpenAI-compatible endpoint returns usage for non-streaming requests:

  curl http://devbox:8080/v1/chat/completions \
    -H 'Content-Type: application/json' \
    -d '{"model":"qwen36-35b-a3b","messages":[{"role":"user","content":"hi"}],"stream":false}' \
    | python3 -m json.tool | grep -A5 usage

  Response includes:

  "usage": {
    "completion_tokens": 254,
    "prompt_tokens": 11,
    "total_tokens": 265,
    "prompt_tokens_details": {
      "cached_tokens": 0
    }
  }

  For streaming, usage appears when stream_options.include_usage=true is sent:

  curl -N http://devbox:8080/v1/chat/completions \
    -H 'Content-Type: application/json' \
    -d '{"model":"qwen36-35b-a3b","messages":[{"role":"user","content":"hi"}],"stream":true,"stream_options":
  {"include_usage":true}}' \
    | grep -i '"usage"'

  Streaming without stream_options.include_usage=true does not return usage.

Impact and severity

No response

Additional information

Suspected cause

OpenClaw already has usage-aware OpenAI completions transport code:

  • src/agents/openai-transport-stream.ts sends stream_options: { include_usage: true }
  • parseTransportChunkUsage(...) maps:
    • usage.prompt_tokens
    • usage.completion_tokens
    • usage.prompt_tokens_details.cached_tokens

However, normal simple completion flow appears to route through that custom transport only when request transport overrides such as request.proxy or request.tls are active.

Relevant files:

  • src/agents/simple-completion-transport.ts
  • src/agents/provider-transport-stream.ts
  • src/agents/openai-transport-stream.ts

Proposed fix

Route openai-completions simple completions through OpenClaw's usage-aware custom completions transport when streaming usage compatibility is known, even without proxy/TLS overrides.

The fix should not require users to change prompts or configure a fake request.proxy / request.tls.

extent analysis

TL;DR

Route openai-completions simple completions through OpenClaw's usage-aware custom completions transport to fix the issue with zero token usage being written to session JSONL.

Guidance

  • Review the src/agents/simple-completion-transport.ts and src/agents/openai-transport-stream.ts files to understand how the custom transport is currently being used.
  • Modify the completion flow to route through the usage-aware custom transport when streaming usage compatibility is known, even without proxy/TLS overrides.
  • Verify that the fix works by checking the session JSONL for correct usage records after sending a prompt through the modified completion flow.
  • Test the fix with different scenarios, including non-streaming requests and streaming requests with and without stream_options.include_usage=true.

Example

No code example is provided as the issue does not include enough information to create a specific code snippet.

Notes

The proposed fix requires modifying the OpenClaw code to route simple completions through the usage-aware custom transport. This may involve updating the src/agents/simple-completion-transport.ts and src/agents/openai-transport-stream.ts files.

Recommendation

Apply the workaround by modifying the completion flow to route through the usage-aware custom transport. This will allow OpenClaw to correctly record usage in the session JSONL without requiring users to change prompts or configure a fake request.proxy / request.tls.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

OpenClaw should send/request streaming usage for compatible openai-completions endpoints and write real usage into session JSONL, for example: { "input": 11, "output": 254, "cacheRead": 0, "cacheWrite": 0, "totalTokens": 265 }

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING