ollama - 💡(How to fix) Fix Feature Request: Include usage metrics in streaming chat responses [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#14683Fetched 2026-04-08 00:32:59
View on GitHub
Comments
1
Participants
2
Timeline
3
Reactions
0
Author
Timeline (top)
closed ×1commented ×1labeled ×1
RAW_BUFFERClick to expand / collapse

Problem Description

Currently, the OpenAPI spec defines ChatStreamEvent without the usage/metrics fields that are available in the non-streaming ChatResponse. This creates an inconsistency where developers cannot access important performance metrics (token counts, timing information) during streaming responses.

Proposed Solution

Add the following fields to ChatStreamEvent in the OpenAPI spec to match ChatResponse:

  • total_duration - Total request duration in nanoseconds
  • load_duration - Model loading duration in nanoseconds
  • prompt_eval_count - Number of tokens in the prompt
  • prompt_eval_duration - Time spent evaluating the prompt in nanoseconds
  • eval_count - Number of tokens in the response
  • eval_duration - Time spent generating response in nanoseconds

Additionally, add logprobs to ChatStreamEvent since it's present in GenerateStreamEvent but missing from the chat streaming spec.

Use Case

Developers need real-time token usage and timing data during streaming to:

  • Display token counts to users in real-time
  • Monitor model performance and latency
  • Implement rate limiting based on actual usage
  • Debug performance issues

Implementation Suggestion

The Go implementation already uses a single ChatResponse struct for both streaming and non-streaming responses (see api/types.go). The fix primarily requires updating the OpenAPI spec in docs/api.md to include the metrics fields in ChatStreamEvent.

Reference: A similar pattern is already implemented for GenerateStreamEvent which correctly includes these metrics.

Additional Context

This would also resolve the inconsistency noted in issue #14680 regarding the mismatch between streaming and non-streaming response schemas.

extent analysis

Fix Plan

To address the inconsistency in the OpenAPI spec, follow these steps:

  • Update the ChatStreamEvent definition in docs/api.md to include the missing fields:
    • total_duration
    • load_duration
    • prompt_eval_count
    • prompt_eval_duration
    • eval_count
    • eval_duration
    • logprobs
  • Ensure the Go implementation in api/types.go is updated to include these fields in the ChatStreamEvent struct.

Example code snippet for the updated ChatStreamEvent struct in api/types.go:

type ChatStreamEvent struct {
    // ... existing fields ...
    TotalDuration       int64 `json:"total_duration"`
    LoadDuration        int64 `json:"load_duration"`
    PromptEvalCount     int64 `json:"prompt_eval_count"`
    PromptEvalDuration  int64 `json:"prompt_eval_duration"`
    EvalCount           int64 `json:"eval_count"`
    EvalDuration        int64 `json:"eval_duration"`
    Logprobs            []float64 `json:"logprobs"`
}

Example YAML snippet for the updated ChatStreamEvent definition in docs/api.md:

components:
  schemas:
    ChatStreamEvent:
      type: object
      properties:
        # ... existing properties ...
        total_duration:
          type: integer
          description: Total request duration in nanoseconds
        load_duration:
          type: integer
          description: Model loading duration in nanoseconds
        prompt_eval_count:
          type: integer
          description: Number of tokens in the prompt
        prompt_eval_duration:
          type: integer
          description: Time spent evaluating the prompt in nanoseconds
        eval_count:
          type: integer
          description: Number of tokens in the response
        eval_duration:
          type: integer
          description: Time spent generating response in nanoseconds
        logprobs:
          type: array
          items:
            type: number
          description: Log probabilities

Verification

To verify the fix, check the following:

  • The updated ChatStreamEvent definition is reflected in the OpenAPI spec.
  • The Go implementation correctly includes the new fields in the ChatStreamEvent struct.
  • The streaming API returns the expected metrics fields in the response.

Extra Tips

  • Ensure that the updated OpenAPI spec is properly validated and tested to avoid any inconsistencies or errors.
  • Consider adding documentation and examples to help developers understand how to use the new metrics fields in their applications.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING