litellm - 💡(How to fix) Fix [Bug]: running async stream request with customized router has no usage feedback [1 comments, 1 participants]

litellm2026-03-26 03:26:35

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#24602•Fetched 2026-04-08 01:32:24

View on GitHub

Comments

Participants

Timeline

Reactions

Author

WilliamChen-luckbob

Participants

WilliamChen-luckbob

Timeline (top)

labeled ×3closed ×1commented ×1

Code Example

response = await self.router.acompletion(
                model=model,
                messages=messages,
                stream=True,
                stream_options={"include_usage": True},
                **kwargs
            )

---

a gemini calling log

11:18:35 - LiteLLM:DEBUG: vertex_and_google_ai_studio_gemini.py:2870 - RAW GEMINI CHUNK: {'candidates': [{'content': {'parts': [{'text': '我是一只善于倾听、乐于助人的AI，拥有丰富的知识和无限的创造力。我可以为你写诗、', 'thoughtSignature': 'EjQKMgG+Pvb7oCvrj7MYaJ3aIzKJZdU1mGwVcOo4MV2V+526E/6cMFxTGOEjgChU/cZJhSqX'}], 'role': 'model'}, 'index': 0}], 'usageMetadata': {'promptTokenCount': 7, 'candidatesTokenCount': 28, 'totalTokenCount': 35, 'promptTokensDetails': [{'modality': 'TEXT', 'tokenCount': 7}]}, 'modelVersion': 'gemini-3.1-flash-image-preview', 'responseId': 'iqXEaf_RJ-Kv_uMPpqrYmQU'}
11:18:35 - LiteLLM:DEBUG: streaming_handler.py:968 - model_response.choices[0].delta: Delta(provider_specific_fields={'thought_signatures': ['EjQKMgG+Pvb7oCvrj7MYaJ3aIzKJZdU1mGwVcOo4MV2V+526E/6cMFxTGOEjgChU/cZJhSqX']}, content='我是一只善于倾听、乐于助人的AI，拥有丰富的知识和无限的创造力。我可以为你写诗、', role='assistant', function_call=None, tool_calls=None, audio=None)

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

A bug happened!

Steps to Reproduce

            response = await self.router.acompletion(
                model=model,
                messages=messages,
                stream=True,
                stream_options={"include_usage": True},
                **kwargs
            )

the async streaming response is OK, but I can never retrieve the usage, even in debug mode there are usageMetadata printed for gemini models, but unable to retrieve that usage in chunks.

also, other LLMs seems not able to return anything related to usage

Relevant log output

a gemini calling log

11:18:35 - LiteLLM:DEBUG: vertex_and_google_ai_studio_gemini.py:2870 - RAW GEMINI CHUNK: {'candidates': [{'content': {'parts': [{'text': '我是一只善于倾听、乐于助人的AI，拥有丰富的知识和无限的创造力。我可以为你写诗、', 'thoughtSignature': 'EjQKMgG+Pvb7oCvrj7MYaJ3aIzKJZdU1mGwVcOo4MV2V+526E/6cMFxTGOEjgChU/cZJhSqX'}], 'role': 'model'}, 'index': 0}], 'usageMetadata': {'promptTokenCount': 7, 'candidatesTokenCount': 28, 'totalTokenCount': 35, 'promptTokensDetails': [{'modality': 'TEXT', 'tokenCount': 7}]}, 'modelVersion': 'gemini-3.1-flash-image-preview', 'responseId': 'iqXEaf_RJ-Kv_uMPpqrYmQU'}
11:18:35 - LiteLLM:DEBUG: streaming_handler.py:968 - model_response.choices[0].delta: Delta(provider_specific_fields={'thought_signatures': ['EjQKMgG+Pvb7oCvrj7MYaJ3aIzKJZdU1mGwVcOo4MV2V+526E/6cMFxTGOEjgChU/cZJhSqX']}, content='我是一只善于倾听、乐于助人的AI，拥有丰富的知识和无限的创造力。我可以为你写诗、', role='assistant', function_call=None, tool_calls=None, audio=None)

What part of LiteLLM is this about?

SDK (litellm Python package)

What LiteLLM version are you on ?

1.82.0

Twitter / LinkedIn details

No response

extent analysis

Fix Plan

To fix the issue of not being able to retrieve usage metadata in chunks from the async streaming response, we need to modify the code to properly handle the usageMetadata field in the response.

Here are the steps:

Modify the stream_options to include include_usage and set it to True.
Update the response handling code to extract the usageMetadata from the response chunks.

Example code:

stream_options = {"include_usage": True}
response = await self.router.acompletion(
    model=model,
    messages=messages,
    stream=True,
    stream_options=stream_options,
    **kwargs
)

for chunk in response:
    if 'usageMetadata' in chunk:
        usage_metadata = chunk['usageMetadata']
        # Process the usage metadata
        print(usage_metadata)

Alternatively, you can use a callback function to handle the response chunks:

def handle_chunk(chunk):
    if 'usageMetadata' in chunk:
        usage_metadata = chunk['usageMetadata']
        # Process the usage metadata
        print(usage_metadata)

response = await self.router.acompletion(
    model=model,
    messages=messages,
    stream=True,
    stream_options=stream_options,
    callback=handle_chunk,
    **kwargs
)

Verification

To verify that the fix worked, you can check the output of the print statement or the callback function. The usage metadata should be printed or processed correctly.

Extra Tips

Make sure to check the LiteLLM documentation for any updates or changes to the API. Additionally, you can try printing the entire response chunk to see if the usageMetadata field is present but not being extracted correctly.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#dependency error #configuration error #environment variable #network issue #logging issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - 💡(How to fix) Fix [Bug]: running async stream request with customized router has no usage feedback [1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Code Example

Check for existing issues

What happened?

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

TRENDING

litellm - 💡(How to fix) Fix [Bug]: running async stream request with customized router has no usage feedback [1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Code Example

Check for existing issues

What happened?

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING