ollama - ✅(Solved) Fix Missing token usage statistics in streaming responses for cloud models [1 pull requests, 1 participants]

ollama2026-03-31 10:27:10

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

ollama/ollama#15169•Fetched 2026-04-08 01:58:28

View on GitHub

Comments

Participants

Timeline

Reactions

Author

panmcai

Participants

panmcai

Timeline (top)

labeled ×2cross-referenced ×1

PR fix notes

PR #15208: Add token usage for cloud model streaming (inject stream_options.include_usage)

Repository: ollama/ollama
Author: mahendrarathore1742
State: open | merged: False
Link: https://github.com/ollama/ollama/pull/15208

Description (problem / solution / changelog)

Inject include_usage stream_options for proxied OpenAI streaming requests when missing. Add a cloud passthrough test to ensure usage options are forwarded for stream=true chat completions.

Summary:

Ensure cloud OpenAI-compatible streaming requests include token usage by injecting stream_options.include_usage=true when stream:true and no stream options are provided.
Files changed: cloud_proxy.go, routes_cloud_test.go
Tests: added a proxy test that verifies stream_options.include_usage is forwarded for stream:true chat completions.
Note: Local full test run can fail in minimal environments due to native linker requirements; server tests pass. CI should run the full suite.

Closes: #15169

Changed files

server/cloud_proxy.go (modified, +50/-0)
server/routes_cloud_test.go (modified, +42/-0)

Code Example

curl -i http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "minimax-m2.7:cloud",
    "messages": [{"role": "user", "content": "Hello"}],
    "stream": true
  }'

---

RAW_BUFFERClick to expand / collapse

What is the issue?

I'm using Ollama cloud models (e.g., minimax-m2.7:cloud) and cannot find a way to get token usage statistics when using streaming mode.

Steps to reproduce:

Run a streaming chat completion request:

curl -i http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "minimax-m2.7:cloud",
    "messages": [{"role": "user", "content": "Hello"}],
    "stream": true
  }'

Check response headers - no token statistics present
Check response body (each chunk) - no usage field

Expected behavior:

Response headers should include X-Prompt-Tokens, X-Completion-Tokens, X-Total-Tokens
Or the last chunk should include usage statistics
Or at least a way to track token consumption for cloud models

Actual behavior:

No token usage information in response headers
No usage field in streaming chunks
No way to track token consumption for billing/usage monitoring

Environment:

Ollama version: (0.19.0)
Model: minimax-m2.7:cloud (cloud model)
API endpoint: /v1/chat/completions

Additional context:

Non-streaming requests return usage in JSON response body
This is critical for monitoring usage and implementing fair billing

Relevant log output

OS

No response

GPU

No response

CPU

No response

Ollama version

No response

extent analysis

TL;DR

The most likely fix is to modify the API request or handle the response differently to obtain token usage statistics when using streaming mode with Ollama cloud models.

Guidance

Investigate the Ollama API documentation for any specific parameters or headers that can be added to the request to include token usage statistics in the response when streaming is enabled.
Check if there are any additional response headers or fields in the streaming chunks that could be used to track token consumption, even if not explicitly documented.
Consider implementing a workaround by making an additional non-streaming request to the same endpoint to obtain usage statistics, although this may not be ideal for real-time monitoring.
Review the Ollama version (0.19.0) release notes and any subsequent updates for potential fixes or improvements related to token usage reporting in streaming mode.

Example

No specific code example can be provided without further information on the Ollama API capabilities or modifications.

Notes

The solution may depend on the specific capabilities of the Ollama API and the cloud model being used. If the API does not currently support reporting token usage in streaming mode, a feature request or workaround may be necessary.

Recommendation

Apply a workaround, such as making an additional non-streaming request or tracking usage through other means, until a direct solution is available, as the current implementation does not support token usage reporting in streaming mode.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #model save/load #optimization #mixed precision #training loop

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

ollama - ✅(Solved) Fix Missing token usage statistics in streaming responses for cloud models [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

PR fix notes

PR #15208: Add token usage for cloud model streaming (inject stream_options.include_usage)

Description (problem / solution / changelog)

Changed files

Code Example

What is the issue?

Steps to reproduce:

Expected behavior:

Actual behavior:

Environment:

Additional context:

Relevant log output

OS

GPU

CPU

Ollama version

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

ollama - ✅(Solved) Fix Missing token usage statistics in streaming responses for cloud models [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

PR fix notes

PR #15208: Add token usage for cloud model streaming (inject stream_options.include_usage)

Description (problem / solution / changelog)

Changed files

Code Example

What is the issue?

Steps to reproduce:

Expected behavior:

Actual behavior:

Environment:

Additional context:

Relevant log output

OS

GPU

CPU

Ollama version

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING