ollama - ✅(Solved) Fix Missing token usage statistics in streaming responses for cloud models [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#15169Fetched 2026-04-08 01:58:28
View on GitHub
Comments
0
Participants
1
Timeline
3
Reactions
0
Author
Participants
Timeline (top)
labeled ×2cross-referenced ×1

PR fix notes

PR #15208: Add token usage for cloud model streaming (inject stream_options.include_usage)

Description (problem / solution / changelog)

Inject include_usage stream_options for proxied OpenAI streaming requests when missing. Add a cloud passthrough test to ensure usage options are forwarded for stream=true chat completions.

Summary:

  • Ensure cloud OpenAI-compatible streaming requests include token usage by injecting stream_options.include_usage=true when stream:true and no stream options are provided.
  • Files changed: cloud_proxy.go, routes_cloud_test.go
  • Tests: added a proxy test that verifies stream_options.include_usage is forwarded for stream:true chat completions.
  • Note: Local full test run can fail in minimal environments due to native linker requirements; server tests pass. CI should run the full suite.

Closes: #15169

Changed files

  • server/cloud_proxy.go (modified, +50/-0)
  • server/routes_cloud_test.go (modified, +42/-0)

Code Example

curl -i http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "minimax-m2.7:cloud",
    "messages": [{"role": "user", "content": "Hello"}],
    "stream": true
  }'

---
RAW_BUFFERClick to expand / collapse

What is the issue?

I'm using Ollama cloud models (e.g., minimax-m2.7:cloud) and cannot find a way to get token usage statistics when using streaming mode.

Steps to reproduce:

  1. Run a streaming chat completion request:
curl -i http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "minimax-m2.7:cloud",
    "messages": [{"role": "user", "content": "Hello"}],
    "stream": true
  }'
  1. Check response headers - no token statistics present
  2. Check response body (each chunk) - no usage field

Expected behavior:

  • Response headers should include X-Prompt-Tokens, X-Completion-Tokens, X-Total-Tokens
  • Or the last chunk should include usage statistics
  • Or at least a way to track token consumption for cloud models

Actual behavior:

  • No token usage information in response headers
  • No usage field in streaming chunks
  • No way to track token consumption for billing/usage monitoring

Environment:

  • Ollama version: (0.19.0)
  • Model: minimax-m2.7:cloud (cloud model)
  • API endpoint: /v1/chat/completions

Additional context:

  • Non-streaming requests return usage in JSON response body
  • This is critical for monitoring usage and implementing fair billing

Relevant log output

OS

No response

GPU

No response

CPU

No response

Ollama version

No response

extent analysis

TL;DR

The most likely fix is to modify the API request or handle the response differently to obtain token usage statistics when using streaming mode with Ollama cloud models.

Guidance

  • Investigate the Ollama API documentation for any specific parameters or headers that can be added to the request to include token usage statistics in the response when streaming is enabled.
  • Check if there are any additional response headers or fields in the streaming chunks that could be used to track token consumption, even if not explicitly documented.
  • Consider implementing a workaround by making an additional non-streaming request to the same endpoint to obtain usage statistics, although this may not be ideal for real-time monitoring.
  • Review the Ollama version (0.19.0) release notes and any subsequent updates for potential fixes or improvements related to token usage reporting in streaming mode.

Example

No specific code example can be provided without further information on the Ollama API capabilities or modifications.

Notes

The solution may depend on the specific capabilities of the Ollama API and the cloud model being used. If the API does not currently support reporting token usage in streaming mode, a feature request or workaround may be necessary.

Recommendation

Apply a workaround, such as making an additional non-streaming request or tracking usage through other means, until a direct solution is available, as the current implementation does not support token usage reporting in streaming mode.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING