ollama - 💡(How to fix) Fix Feature Request: Include usage metrics in streaming chat responses [1 comments, 2 participants]

ollama2026-03-07 01:24:40

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

ollama/ollama#14683•Fetched 2026-04-08 00:32:59

View on GitHub

Comments

Participants

Timeline

Reactions

Author

fuleinist

Participants

fuleinist

rick-github

Timeline (top)

closed ×1commented ×1labeled ×1

RAW_BUFFERClick to expand / collapse

Problem Description

Currently, the OpenAPI spec defines ChatStreamEvent without the usage/metrics fields that are available in the non-streaming ChatResponse. This creates an inconsistency where developers cannot access important performance metrics (token counts, timing information) during streaming responses.

Proposed Solution

Add the following fields to ChatStreamEvent in the OpenAPI spec to match ChatResponse:

total_duration - Total request duration in nanoseconds
load_duration - Model loading duration in nanoseconds
prompt_eval_count - Number of tokens in the prompt
prompt_eval_duration - Time spent evaluating the prompt in nanoseconds
eval_count - Number of tokens in the response
eval_duration - Time spent generating response in nanoseconds

Additionally, add logprobs to ChatStreamEvent since it's present in GenerateStreamEvent but missing from the chat streaming spec.

Use Case

Developers need real-time token usage and timing data during streaming to:

Display token counts to users in real-time
Monitor model performance and latency
Implement rate limiting based on actual usage
Debug performance issues

Implementation Suggestion

The Go implementation already uses a single ChatResponse struct for both streaming and non-streaming responses (see api/types.go). The fix primarily requires updating the OpenAPI spec in docs/api.md to include the metrics fields in ChatStreamEvent.

Reference: A similar pattern is already implemented for GenerateStreamEvent which correctly includes these metrics.

Additional Context

This would also resolve the inconsistency noted in issue #14680 regarding the mismatch between streaming and non-streaming response schemas.

extent analysis

Fix Plan

To address the inconsistency in the OpenAPI spec, follow these steps:

Update the ChatStreamEvent definition in docs/api.md to include the missing fields:
- total_duration
- load_duration
- prompt_eval_count
- prompt_eval_duration
- eval_count
- eval_duration
- logprobs
Ensure the Go implementation in api/types.go is updated to include these fields in the ChatStreamEvent struct.

Example code snippet for the updated ChatStreamEvent struct in api/types.go:

type ChatStreamEvent struct {
    // ... existing fields ...
    TotalDuration       int64 `json:"total_duration"`
    LoadDuration        int64 `json:"load_duration"`
    PromptEvalCount     int64 `json:"prompt_eval_count"`
    PromptEvalDuration  int64 `json:"prompt_eval_duration"`
    EvalCount           int64 `json:"eval_count"`
    EvalDuration        int64 `json:"eval_duration"`
    Logprobs            []float64 `json:"logprobs"`
}

Example YAML snippet for the updated ChatStreamEvent definition in docs/api.md:

components:
  schemas:
    ChatStreamEvent:
      type: object
      properties:
        # ... existing properties ...
        total_duration:
          type: integer
          description: Total request duration in nanoseconds
        load_duration:
          type: integer
          description: Model loading duration in nanoseconds
        prompt_eval_count:
          type: integer
          description: Number of tokens in the prompt
        prompt_eval_duration:
          type: integer
          description: Time spent evaluating the prompt in nanoseconds
        eval_count:
          type: integer
          description: Number of tokens in the response
        eval_duration:
          type: integer
          description: Time spent generating response in nanoseconds
        logprobs:
          type: array
          items:
            type: number
          description: Log probabilities

Verification

To verify the fix, check the following:

The updated ChatStreamEvent definition is reflected in the OpenAPI spec.
The Go implementation correctly includes the new fields in the ChatStreamEvent struct.
The streaming API returns the expected metrics fields in the response.

Extra Tips

Ensure that the updated OpenAPI spec is properly validated and tested to avoid any inconsistencies or errors.
Consider adding documentation and examples to help developers understand how to use the new metrics fields in their applications.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #model loading #logging issue #authentication issue #prompt issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

ollama - 💡(How to fix) Fix Feature Request: Include usage metrics in streaming chat responses [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Problem Description

Proposed Solution

Use Case

Implementation Suggestion

Additional Context

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

TRENDING

ollama - 💡(How to fix) Fix Feature Request: Include usage metrics in streaming chat responses [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Problem Description

Proposed Solution

Use Case

Implementation Suggestion

Additional Context

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING