openclaw - ✅(Solved) Fix openai-completions simple sessions log zero token usage despite endpoint returning streaming usage [1 pull requests, 1 comments, 2 participants]

openclaw2026-05-01 02:00:26

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#75357•Fetched 2026-05-01 05:34:46

View on GitHub

Comments

Participants

Timeline

Reactions

Author

khaney64

Participants

clawsweeper[bot]

khaney64

Timeline (top)

labeled ×2commented ×1cross-referenced ×1referenced ×1

openai-completions sessions can still write zero token usage to session JSONL for llama.cpp/OpenAI-compatible endpo ints, even though the endpoint returns valid usage when streaming requests include `stream_options.include_usage=true

This appears to be a gap after earlier fixes such as #59328, #68707, #56670, and #49753. Those fixes added usage capture in OpenClaw's custom OpenAI completions transport, but normal no-proxy/no-TLS simple openai-completions runs appear not to route through that usage-aware transport.

Related but broader: #73990.

Root Cause

Related but broader: #73990.

PR fix notes

PR #16: fix(agents): inject stream_options.include_usage for openai-completions streaming

Repository: suboss87/openclaw
Author: suboss87
State: open | merged: False
Link: https://github.com/suboss87/openclaw/pull/16

Description (problem / solution / changelog)

Summary

Problem: openai-completions streaming requests omit stream_options.include_usage=true, so most OpenAI-compatible endpoints (llama.cpp, vLLM, LM Studio, etc.) do not return usage data in their final streaming chunk. Sessions record zero tokens even though the same endpoint returns usage for non-streaming requests.
Why it matters: Zero token counts break per-session cost tracking, rate-limit guards, and context-window management for all self-hosted / local model users on the completions API.
What changed: Added createOpenAICompletionsStreamUsageWrapper in openai-stream-wrappers.ts that intercepts outgoing payloads and injects stream_options: { include_usage: true } whenever stream === true and model.api === "openai-completions". Existing stream_options objects are merged, not replaced. applyExtraParamsToAgent now chains this wrapper unconditionally for every completions agent.
What did NOT change: No behavior change for openai-responses, anthropic-messages, Google, or any other API type. No new config knob; the option is universally safe - endpoints that do not support it ignore the field.

Change Type (select all)

Bug fix

Scope (select all touched areas)

Gateway / orchestration
Integrations

Linked Issue/PR

Closes #75357

User-visible / Behavior Changes

Sessions using local/self-hosted models via openai-completions (llama.cpp, vLLM, LM Studio, Ollama) will now have accurate token usage recorded per turn.

Security Impact (required)

New permissions/capabilities? No
Secrets/tokens handling changed? No
New/changed network calls? No (existing completions requests; one additional field in payload)
Command/tool execution surface changed? No
Data access scope changed? No

Repro + Verification

Environment

OS: Linux
Runtime/container: Node 22 / Bun
Model/provider: openai-completions (llama.cpp, vLLM, or any OpenAI-compatible endpoint)
Integration/channel: any

Steps

Configure an agent with api: openai-completions pointing at llama.cpp or vLLM.
Send a message and watch the outbound streaming payload (via a proxy or debug log).
Before this fix: payload has no stream_options field; endpoint returns usage: null in the final chunk.
After this fix: payload contains "stream_options": {"include_usage": true}; endpoint returns populated usage.

Expected

stream_options.include_usage is present in every streaming completions request.
Token counts for the session are non-zero after the turn.

Actual (before fix)

stream_options absent from payload; token usage stays at zero.

Evidence

Failing test/log before + passing after

7 new unit tests in extra-params.openai-completions-stream-usage.test.ts cover:

injection when stream: true
no injection when stream: false or absent
no injection for other API types (openai-responses, anthropic-messages)
merging into existing stream_options without clobbering other keys
overwrite of a pre-set include_usage: false

Human Verification (required)

Verified scenarios: wrapper injects correct field for completions; skips for all other API types; merges cleanly with pre-existing stream_options.
Edge cases checked: stream: false, missing stream key, existing stream_options with unrelated keys, pre-set include_usage.
What you did not verify: live endpoint round-trip (requires running llama.cpp/vLLM instance).

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

Backward compatible? Yes
Config/env changes? No
Migration needed? No

Failure Recovery (if this breaks)

How to disable/revert this change quickly: revert the createOpenAICompletionsStreamUsageWrapper call in applyExtraParamsToAgent.
Files/config to restore: src/agents/pi-embedded-runner/extra-params.ts
Known bad symptoms: if an endpoint rejects unknown fields, requests may fail - but no known OpenAI-compatible endpoint rejects stream_options.

Risks and Mitigations

Risk: an exotic OpenAI-compatible endpoint rejects stream_options when it does not recognise the field.
- Mitigation: the OpenAI spec documents stream_options as an optional object; compliant endpoints must ignore unknown optional parameters. No known endpoint rejects it.

Generated by Claude Code

Changed files

src/agents/pi-embedded-runner/extra-params.openai-completions-stream-usage.test.ts (added, +150/-0)
src/agents/pi-embedded-runner/extra-params.ts (modified, +8/-0)
src/agents/pi-embedded-runner/openai-stream-wrappers.ts (modified, +37/-0)

Code Example

hi

  4. Inspect the resulting session JSONL assistant message.
  5. Observe that the assistant message usage is recorded as zeros:

     {
       "input": 0,
       "output": 0,
       "cacheRead": 0,
       "cacheWrite": 0,
       "totalTokens": 0
     }

  6. Verify the same endpoint returns usage for a non-streaming request:

     curl http://devbox:8080/v1/chat/completions \
       -H 'Content-Type: application/json' \
       -d '{"model":"qwen36-35b-a3b","messages":[{"role":"user","content":"hi"}],"stream":false}' \
       | python3 -m json.tool | grep -A5 usage

  7. Verify the same endpoint returns usage for a streaming request when stream_options.include_usage=true is included:

     curl -N http://devbox:8080/v1/chat/completions \
       -H 'Content-Type: application/json' \
       -d '{"model":"qwen36-35b-a3b","messages":[{"role":"user","content":"hi"}],"stream":true,"stream_options":
  {"include_usage":true}}' \
       | grep -i '"usage"'

  8. Compare with streaming without stream_options.include_usage=true; usage is not returned:

     curl -N http://devbox:8080/v1/chat/completions \
       -H 'Content-Type: application/json' \
       -d '{"model":"qwen36-35b-a3b","messages":[{"role":"user","content":"hi"}],"stream":true}' \
       | grep -i '"usage"'


### Expected behavior

  OpenClaw should send/request streaming usage for compatible `openai-completions` endpoints and write real usage into
  session JSONL, for example:
  {
    "input": 11,
    "output": 254,
    "cacheRead": 0,
    "cacheWrite": 0,
    "totalTokens": 265
  }


### Actual behavior

  Assistant message records in the session JSONL contain zeros:

  {
    "input": 0,
    "output": 0,
    "cacheRead": 0,
    "cacheWrite": 0,
    "totalTokens": 0
  }

  This causes session-cost tooling to report usage as not reported by the model, even though the model server does
  return usage.


### OpenClaw version

OpenClaw 2026.4.27 (cbc2ba0)

### Operating system

Ubuntu 24.04

### Install method

npm global

### Model

qwen35-35b-a3b

### Provider / routing chain

openclaw -> llama.cpp -> qwen35

### Additional provider/model setup details

_No response_

### Logs, screenshots, and evidence

RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

Summary

Related but broader: #73990.

Steps to reproduce

Configure an OpenAI-compatible openai-completions model backed by llama.cpp, without request.proxy or request.tls transport overrides.
Start an OpenClaw session using that model.
Send a normal prompt, for example:
```
hi
```
Inspect the resulting session JSONL assistant message.
Observe that the assistant message usage is recorded as zeros:

{ "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0, "totalTokens": 0 }
Verify the same endpoint returns usage for a non-streaming request:

curl http://devbox:8080/v1/chat/completions
-H 'Content-Type: application/json'
-d '{"model":"qwen36-35b-a3b","messages":[{"role":"user","content":"hi"}],"stream":false}'
| python3 -m json.tool | grep -A5 usage
Verify the same endpoint returns usage for a streaming request when stream_options.include_usage=true is included:

curl -N http://devbox:8080/v1/chat/completions
-H 'Content-Type: application/json'
-d '{"model":"qwen36-35b-a3b","messages":[{"role":"user","content":"hi"}],"stream":true,"stream_options": {"include_usage":true}}'
| grep -i '"usage"'
Compare with streaming without stream_options.include_usage=true; usage is not returned:

curl -N http://devbox:8080/v1/chat/completions
-H 'Content-Type: application/json'
-d '{"model":"qwen36-35b-a3b","messages":[{"role":"user","content":"hi"}],"stream":true}'
| grep -i '"usage"'

Expected behavior

OpenClaw should send/request streaming usage for compatible openai-completions endpoints and write real usage into session JSONL, for example: { "input": 11, "output": 254, "cacheRead": 0, "cacheWrite": 0, "totalTokens": 265 }

Actual behavior

Assistant message records in the session JSONL contain zeros:

{ "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0, "totalTokens": 0 }

This causes session-cost tooling to report usage as not reported by the model, even though the model server does return usage.

OpenClaw version

OpenClaw 2026.4.27 (cbc2ba0)

Operating system

Ubuntu 24.04

Install method

npm global

Model

qwen35-35b-a3b

Provider / routing chain

openclaw -> llama.cpp -> qwen35

Additional provider/model setup details

No response

Logs, screenshots, and evidence

The llama.cpp/OpenAI-compatible endpoint returns usage for non-streaming requests:

  curl http://devbox:8080/v1/chat/completions \
    -H 'Content-Type: application/json' \
    -d '{"model":"qwen36-35b-a3b","messages":[{"role":"user","content":"hi"}],"stream":false}' \
    | python3 -m json.tool | grep -A5 usage

  Response includes:

  "usage": {
    "completion_tokens": 254,
    "prompt_tokens": 11,
    "total_tokens": 265,
    "prompt_tokens_details": {
      "cached_tokens": 0
    }
  }

  For streaming, usage appears when stream_options.include_usage=true is sent:

  curl -N http://devbox:8080/v1/chat/completions \
    -H 'Content-Type: application/json' \
    -d '{"model":"qwen36-35b-a3b","messages":[{"role":"user","content":"hi"}],"stream":true,"stream_options":
  {"include_usage":true}}' \
    | grep -i '"usage"'

  Streaming without stream_options.include_usage=true does not return usage.

Impact and severity

No response

Additional information

Suspected cause

OpenClaw already has usage-aware OpenAI completions transport code:

src/agents/openai-transport-stream.ts sends stream_options: { include_usage: true }
parseTransportChunkUsage(...) maps:
- usage.prompt_tokens
- usage.completion_tokens
- usage.prompt_tokens_details.cached_tokens

However, normal simple completion flow appears to route through that custom transport only when request transport overrides such as request.proxy or request.tls are active.

Relevant files:

src/agents/simple-completion-transport.ts
src/agents/provider-transport-stream.ts
src/agents/openai-transport-stream.ts

Proposed fix

Route openai-completions simple completions through OpenClaw's usage-aware custom completions transport when streaming usage compatibility is known, even without proxy/TLS overrides.

The fix should not require users to change prompts or configure a fake request.proxy / request.tls.

extent analysis

TL;DR

Route openai-completions simple completions through OpenClaw's usage-aware custom completions transport to fix the issue with zero token usage being written to session JSONL.

Guidance

Review the src/agents/simple-completion-transport.ts and src/agents/openai-transport-stream.ts files to understand how the custom transport is currently being used.
Modify the completion flow to route through the usage-aware custom transport when streaming usage compatibility is known, even without proxy/TLS overrides.
Verify that the fix works by checking the session JSONL for correct usage records after sending a prompt through the modified completion flow.
Test the fix with different scenarios, including non-streaming requests and streaming requests with and without stream_options.include_usage=true.

Example

No code example is provided as the issue does not include enough information to create a specific code snippet.

Notes

The proposed fix requires modifying the OpenClaw code to route simple completions through the usage-aware custom transport. This may involve updating the src/agents/simple-completion-transport.ts and src/agents/openai-transport-stream.ts files.

Recommendation

Apply the workaround by modifying the completion flow to route through the usage-aware custom transport. This will allow OpenClaw to correctly record usage in the session JSONL without requiring users to change prompts or configure a fake request.proxy / request.tls.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

#dependency conflict #environment setup #docker error #permission error #memory optimization

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix openai-completions simple sessions log zero token usage despite endpoint returning streaming usage [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

PR fix notes

PR #16: fix(agents): inject stream_options.include_usage for openai-completions streaming

Description (problem / solution / changelog)

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

User-visible / Behavior Changes

Security Impact (required)

Repro + Verification

Environment

Steps

Expected

Actual (before fix)

Evidence

Human Verification (required)

Review Conversations

Compatibility / Migration

Failure Recovery (if this breaks)

Risks and Mitigations

Changed files

Code Example

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

Suspected cause

Proposed fix

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING