litellm - 💡(How to fix) Fix [Bug]: running async stream request with customized router has no usage feedback [1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#24602Fetched 2026-04-08 01:32:24
View on GitHub
Comments
1
Participants
1
Timeline
5
Reactions
0
Timeline (top)
labeled ×3closed ×1commented ×1

Code Example

response = await self.router.acompletion(
                model=model,
                messages=messages,
                stream=True,
                stream_options={"include_usage": True},
                **kwargs
            )

---

a gemini calling log

11:18:35 - LiteLLM:DEBUG: vertex_and_google_ai_studio_gemini.py:2870 - RAW GEMINI CHUNK: {'candidates': [{'content': {'parts': [{'text': '我是一只善于倾听、乐于助人的AI,拥有丰富的知识和无限的创造力。我可以为你写诗、', 'thoughtSignature': 'EjQKMgG+Pvb7oCvrj7MYaJ3aIzKJZdU1mGwVcOo4MV2V+526E/6cMFxTGOEjgChU/cZJhSqX'}], 'role': 'model'}, 'index': 0}], 'usageMetadata': {'promptTokenCount': 7, 'candidatesTokenCount': 28, 'totalTokenCount': 35, 'promptTokensDetails': [{'modality': 'TEXT', 'tokenCount': 7}]}, 'modelVersion': 'gemini-3.1-flash-image-preview', 'responseId': 'iqXEaf_RJ-Kv_uMPpqrYmQU'}
11:18:35 - LiteLLM:DEBUG: streaming_handler.py:968 - model_response.choices[0].delta: Delta(provider_specific_fields={'thought_signatures': ['EjQKMgG+Pvb7oCvrj7MYaJ3aIzKJZdU1mGwVcOo4MV2V+526E/6cMFxTGOEjgChU/cZJhSqX']}, content='我是一只善于倾听、乐于助人的AI,拥有丰富的知识和无限的创造力。我可以为你写诗、', role='assistant', function_call=None, tool_calls=None, audio=None)
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

A bug happened!

Steps to Reproduce

            response = await self.router.acompletion(
                model=model,
                messages=messages,
                stream=True,
                stream_options={"include_usage": True},
                **kwargs
            )

the async streaming response is OK, but I can never retrieve the usage, even in debug mode there are usageMetadata printed for gemini models, but unable to retrieve that usage in chunks.

also, other LLMs seems not able to return anything related to usage

Relevant log output

a gemini calling log

11:18:35 - LiteLLM:DEBUG: vertex_and_google_ai_studio_gemini.py:2870 - RAW GEMINI CHUNK: {'candidates': [{'content': {'parts': [{'text': '我是一只善于倾听、乐于助人的AI,拥有丰富的知识和无限的创造力。我可以为你写诗、', 'thoughtSignature': 'EjQKMgG+Pvb7oCvrj7MYaJ3aIzKJZdU1mGwVcOo4MV2V+526E/6cMFxTGOEjgChU/cZJhSqX'}], 'role': 'model'}, 'index': 0}], 'usageMetadata': {'promptTokenCount': 7, 'candidatesTokenCount': 28, 'totalTokenCount': 35, 'promptTokensDetails': [{'modality': 'TEXT', 'tokenCount': 7}]}, 'modelVersion': 'gemini-3.1-flash-image-preview', 'responseId': 'iqXEaf_RJ-Kv_uMPpqrYmQU'}
11:18:35 - LiteLLM:DEBUG: streaming_handler.py:968 - model_response.choices[0].delta: Delta(provider_specific_fields={'thought_signatures': ['EjQKMgG+Pvb7oCvrj7MYaJ3aIzKJZdU1mGwVcOo4MV2V+526E/6cMFxTGOEjgChU/cZJhSqX']}, content='我是一只善于倾听、乐于助人的AI,拥有丰富的知识和无限的创造力。我可以为你写诗、', role='assistant', function_call=None, tool_calls=None, audio=None)

What part of LiteLLM is this about?

SDK (litellm Python package)

What LiteLLM version are you on ?

1.82.0

Twitter / LinkedIn details

No response

extent analysis

Fix Plan

To fix the issue of not being able to retrieve usage metadata in chunks from the async streaming response, we need to modify the code to properly handle the usageMetadata field in the response.

Here are the steps:

  • Modify the stream_options to include include_usage and set it to True.
  • Update the response handling code to extract the usageMetadata from the response chunks.

Example code:

stream_options = {"include_usage": True}
response = await self.router.acompletion(
    model=model,
    messages=messages,
    stream=True,
    stream_options=stream_options,
    **kwargs
)

for chunk in response:
    if 'usageMetadata' in chunk:
        usage_metadata = chunk['usageMetadata']
        # Process the usage metadata
        print(usage_metadata)

Alternatively, you can use a callback function to handle the response chunks:

def handle_chunk(chunk):
    if 'usageMetadata' in chunk:
        usage_metadata = chunk['usageMetadata']
        # Process the usage metadata
        print(usage_metadata)

response = await self.router.acompletion(
    model=model,
    messages=messages,
    stream=True,
    stream_options=stream_options,
    callback=handle_chunk,
    **kwargs
)

Verification

To verify that the fix worked, you can check the output of the print statement or the callback function. The usage metadata should be printed or processed correctly.

Extra Tips

Make sure to check the LiteLLM documentation for any updates or changes to the API. Additionally, you can try printing the entire response chunk to see if the usageMetadata field is present but not being extracted correctly.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - 💡(How to fix) Fix [Bug]: running async stream request with customized router has no usage feedback [1 comments, 1 participants]