litellm - ✅(Solved) Fix [Bug]: TTFT not captured for /v1/messages requests (Anthropic, Bedrock, Vertex AI, Azure AI, Minimax) [1 pull requests, 1 participants]

litellm2026-04-12 08:52:58

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#25598•Fetched 2026-04-12 13:24:45

View on GitHub

Comments

Participants

Timeline

Reactions

Author

tanchangsheng

Participants

tanchangsheng

Timeline (top)

labeled ×3referenced ×2cross-referenced ×1

Root Cause

Suspected Root Cause

Fix Action

Fixed

Fixed by PR: fix(streaming): capture TTFT for /v1/messages (Anthropic, Bedrock, Vertex AI) (https://github.com/BerriAI/litellm/pull/25599)

PR fix notes

PR #25599: fix(streaming): capture TTFT for /v1/messages (Anthropic, Bedrock, Vertex AI)

Repository: BerriAI/litellm
Author: joaquinhuigomez
State: open | merged: False
Link: https://github.com/BerriAI/litellm/pull/25599

Description (problem / solution / changelog)

The pass-through streaming path for /v1/messages logged completion_start_time only after the full stream finished. async_success_handler then fell back to end_time, making TTFT equal to total duration or null in the UI and Prometheus.

Record the timestamp of the first chunk in async_sse_wrapper and propagate it to litellm_logging_obj.completion_start_time / model_call_details before the logging handler runs.

Fixes #25598

Changed files

litellm/llms/anthropic/experimental_pass_through/messages/streaming_iterator.py (modified, +12/-0)

Code Example

# litellm/llms/anthropic/experimental_pass_through/messages/streaming_iterator.py
async def async_sse_wrapper(self, completion_stream):
    collected_chunks = []
    async for chunk in completion_stream:
        collected_chunks.append(self._convert_chunk_to_sse_format(chunk))
        yield chunk                                    # chunks go to client here

    await self._handle_streaming_logging(collected_chunks)  # logging fires after last chunk

async def _handle_streaming_logging(self, collected_chunks):
    end_time = datetime.now()                          # captured AFTER all chunks sent
    asyncio.create_task(
        PassThroughStreamingHandler._route_streaming_logging_to_handler(
            ...
            end_time=end_time,
        )
    )

---

# litellm/litellm_core_utils/litellm_logging.py
if self.completion_start_time is None:
    self.completion_start_time = end_time   # falls back to end_time, not first-chunk time
    self.model_call_details["completion_start_time"] = self.completion_start_time

---

# litellm/litellm_core_utils/streaming_handler.py
if self.logging_obj.completion_start_time is None:
    self.logging_obj._update_completion_start_time(
        completion_start_time=datetime.datetime.now()
    )

---

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

Bug Description

Time to First Token (TTFT) is never correctly captured for requests routed through the /v1/messages endpoint when using providers that have a native BaseAnthropicMessagesConfig implementation (Anthropic, Bedrock, Vertex AI Claude, Azure AI Claude, Minimax).

The result is:

Usage logs UI: TTFT column shows - (null) or displays a value equal to the total request duration — neither is the real first-token time.
<img width="1484" height="418" alt="Image" src="https://github.com/user-attachments/assets/261e67c6-92e7-4451-8993-44d8ecc0f305" />
litellm_llm_api_time_to_first_token_metric (Prometheus): Emitted for streaming requests, but the value equals the full API call duration, not the actual TTFT.
<img width="2860" height="1034" alt="Image" src="https://github.com/user-attachments/assets/ef06a2fe-ee12-49f7-94ae-22f4ee0f48ef" />

Expected Behaviour

completion_start_time should be set to the timestamp when the first non-empty chunk is received from the upstream provider, consistent with how the standard /chat/completions streaming path works.

Actual Behaviour

completion_start_time is set to end_time (after the last chunk), making reported TTFT equal to total duration.

Affected Providers

Any provider that returns a non-None value from ProviderConfigManager.get_provider_anthropic_messages_config() uses the pass-through path and is affected:

Provider	Scope
`anthropic`	All models
`bedrock`	Claude models only
`vertex_ai`	Claude models only
`azure_ai`	Claude models only
`minimax`	All models

Providers not affected: any provider that returns None from get_provider_anthropic_messages_config (e.g. Mistral, Cohere, Groq) falls through to LiteLLMMessagesToCompletionTransformationHandler, which uses the standard CustomStreamWrapper and correctly captures completion_start_time on the first streaming chunk.

Suspected Root Cause

The /v1/messages endpoint for the affected providers routes through a pass-through streaming pipeline (BaseAnthropicMessagesStreamingIterator.async_sse_wrapper) that collects all chunks and fires a single logging call after the stream is complete:

# litellm/llms/anthropic/experimental_pass_through/messages/streaming_iterator.py
async def async_sse_wrapper(self, completion_stream):
    collected_chunks = []
    async for chunk in completion_stream:
        collected_chunks.append(self._convert_chunk_to_sse_format(chunk))
        yield chunk                                    # chunks go to client here

    await self._handle_streaming_logging(collected_chunks)  # logging fires after last chunk

async def _handle_streaming_logging(self, collected_chunks):
    end_time = datetime.now()                          # captured AFTER all chunks sent
    asyncio.create_task(
        PassThroughStreamingHandler._route_streaming_logging_to_handler(
            ...
            end_time=end_time,
        )
    )

Because the logging call only receives end_time (after streaming completes), completion_start_time is never set during the stream. Inside async_success_handler, the fallback triggers:

# litellm/litellm_core_utils/litellm_logging.py
if self.completion_start_time is None:
    self.completion_start_time = end_time   # falls back to end_time, not first-chunk time
    self.model_call_details["completion_start_time"] = self.completion_start_time

completion_start_time ends up equal to end_time, so:

In the DB: completionStartTime ≈ endTime → TTFT ≈ Duration or "-" (UI filters it)
In Prometheus: litellm_llm_api_time_to_first_token_metric = completion_start_time − api_call_start_time = full API call duration

Contrast with the standard /chat/completions streaming path, where _update_completion_start_time() is called on the first chunk:

# litellm/litellm_core_utils/streaming_handler.py
if self.logging_obj.completion_start_time is None:
    self.logging_obj._update_completion_start_time(
        completion_start_time=datetime.datetime.now()
    )

Steps to Reproduce

Configure LiteLLM proxy with a Bedrock Claude model (or any other affected provider above).
Send a streaming request to /v1/messages.
Observe the usage logs UI: the TTFT column is either - or shows a value equal to the Duration column.
Observe Prometheus: litellm_llm_api_time_to_first_token_metric value matches litellm_llm_api_latency_metric rather than being a fraction of it.

PR #9688 / Issue #9210 — fixed the same completion_start_time = end_time fallback for /chat/completions streaming, but the /v1/messages pass-through path was not covered.

Relevant log output

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.82.3-stable

Twitter / LinkedIn details

No response

extent analysis

TL;DR

Update the /v1/messages endpoint's pass-through streaming pipeline to set completion_start_time when the first non-empty chunk is received from the upstream provider.

Guidance

Identify the async_sse_wrapper function in streaming_iterator.py and modify it to set completion_start_time when the first chunk is received.
Update the _handle_streaming_logging function to accept completion_start_time as a parameter and use it for logging instead of end_time.
Verify that completion_start_time is correctly set by checking the usage logs UI and Prometheus metrics after applying the changes.
Consider backporting the fix to earlier versions of LiteLLM to ensure consistency across different releases.

Example

async def async_sse_wrapper(self, completion_stream):
    collected_chunks = []
    completion_start_time = None
    async for chunk in completion_stream:
        collected_chunks.append(self._convert_chunk_to_sse_format(chunk))
        if completion_start_time is None and chunk:
            completion_start_time = datetime.now()
        yield chunk

    await self._handle_streaming_logging(collected_chunks, completion_start_time)

async def _handle_streaming_logging(self, collected_chunks, completion_start_time):
    # Use completion_start_time for logging instead of end_time
    asyncio.create_task(
        PassThroughStreamingHandler._route_streaming_logging_to_handler(
            ...
            completion_start_time=completion_start_time,
        )
    )

Notes

The provided code snippet assumes that the completion_stream yields chunks in the correct order and that the first non-empty chunk is the one that should trigger the setting of completion_start_time. Additional error handling and edge cases may need to be considered depending on the specific requirements of the LiteLLM proxy.

Recommendation

Apply the workaround by updating the /v1/messages endpoint's pass-through streaming pipeline to set completion_start_time when the first non-empty chunk is received from the upstream provider. This should fix the issue with incorrect TTFT values in the usage logs UI and Prometheus metrics.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #GPU compatibility #latency issue #model loading #dependency error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

litellm - ✅(Solved) Fix [Bug]: TTFT not captured for /v1/messages requests (Anthropic, Bedrock, Vertex AI, Azure AI, Minimax) [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Suspected Root Cause

Fix Action

Fixed

PR fix notes

PR #25599: fix(streaming): capture TTFT for /v1/messages (Anthropic, Bedrock, Vertex AI)

Description (problem / solution / changelog)

Changed files

Code Example

Check for existing issues

What happened?

Bug Description

Expected Behaviour

Actual Behaviour

Affected Providers

Suspected Root Cause

Steps to Reproduce

Related

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING