litellm - 💡(How to fix) Fix [Bug]: Vertex Gemini traffic_type is only stored in _hidden_params and not exposed in normalized proxy response or headers [1 participants]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#25379Fetched 2026-04-09 07:52:25
View on GitHub
Comments
0
Participants
1
Timeline
3
Reactions
0
Author
Participants
Timeline (top)
labeled ×3

Code Example

model_list:
    - model_name: gemini-25-flash-lite-dedicated
      litellm_params:
        model: vertex_ai/gemini-2.5-flash-lite
        vertex_project: os.environ/VERTEXAI_PROJECT
        vertex_location: os.environ/VERTEXAI_LOCATION
        vertex_credentials: os.environ/GOOGLE_APPLICATION_CREDENTIALS
        extra_headers:
          X-Vertex-AI-LLM-Request-Type: dedicated
          custom-llm-provider: vertex_ai

  litellm_settings:
    return_response_headers: true

---

model_list:
    - model_name: gemini-25-flash-lite-dedicated
      litellm_params:
        model: vertex_ai/gemini-2.5-flash-lite
        vertex_project: os.environ/VERTEXAI_PROJECT
        vertex_location: os.environ/VERTEXAI_LOCATION
        vertex_credentials: os.environ/GOOGLE_APPLICATION_CREDENTIALS
        extra_headers:
          X-Vertex-AI-LLM-Request-Type: dedicated
          custom-llm-provider: vertex_ai

  litellm_settings:
    return_response_headers: true

---

=== Request ===
  URL: https://aiplatform.googleapis.com/v1/projects/<redacted-project>/locations/global/publishers/google/models/gemini-2.5-flash-lite:generateContent
  X-Vertex-AI-LLM-Request-Type: dedicated

  === Response Status ===
  HTTP 200

  === Response Headers ===
  HTTP/2 200
  x-vertex-ai-llm-request-type: dedicated
  content-type: application/json; charset=UTF-8
  vary: X-Origin
  vary: Referer
  vary: Origin,Accept-Encoding
  date: <redacted-timestamp>
  server: scaffolding on HTTPServer2
  x-xss-protection: 0
  x-frame-options: SAMEORIGIN
  x-content-type-options: nosniff
  alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000
  accept-ranges: none

  === Response Body ===
  {
    "candidates": [
      {
        "content": {
          "role": "model",
          "parts": [
            {
              "text": "I am a large language model, trained by Google."
            }
          ]
        },
        "finishReason": "STOP",
        "avgLogprobs": -0.00013389628888531163
      }
    ],
    "usageMetadata": {
      "promptTokenCount": 12,
      "candidatesTokenCount": 11,
      "totalTokenCount": 23,
      "trafficType": "PROVISIONED_THROUGHPUT",
      "promptTokensDetails": [
        {
          "modality": "TEXT",
          "tokenCount": 12
        }
      ],
      "candidatesTokensDetails": [
        {
          "modality": "TEXT",
          "tokenCount": 11
        }
      ]
    },
    "modelVersion": "gemini-2.5-flash-lite",
    "createTime": "<redacted-timestamp>",
    "responseId": "<redacted-response-id>"
  }
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

When calling Vertex AI Gemini models through LiteLLM Proxy, LiteLLM appears to know the Vertex traffic type internally, but does not expose it consistently in the public/normalized response.

For my Gemini alias, the proxy config is:

model_list:
  - model_name: gemini-25-flash-lite-dedicated
    litellm_params:
      model: vertex_ai/gemini-2.5-flash-lite
      vertex_project: os.environ/VERTEXAI_PROJECT
      vertex_location: os.environ/VERTEXAI_LOCATION
      vertex_credentials: os.environ/GOOGLE_APPLICATION_CREDENTIALS
      extra_headers:
        X-Vertex-AI-LLM-Request-Type: dedicated
        custom-llm-provider: vertex_ai

litellm_settings:
  return_response_headers: true

Observed behavior:

  • LiteLLM sends the request successfully to Vertex AI
  • LiteLLM internally records traffic_type
  • but for Gemini, the value is only visible in _hidden_params.provider_specific_fields.traffic_type
  • the normalized response body does not include usage.extra_properties.google.traffic_type
  • the proxy response headers also do not expose a stable traffic type header, even with return_response_headers: true

I had to add a custom callback to:

  1. read _hidden_params.provider_specific_fields.traffic_type
  2. backfill usage.extra_properties.google.traffic_type

Expected behavior: LiteLLM should expose Vertex traffic_type consistently in the public normalized response for Vertex Gemini, the same way it already does for some other Vertex model routes like Vertex OpenAI/Qwen.

In other words, LiteLLM already seems to have the data, but it is not surfacing it consistently.

Steps to Reproduce

  1. Start LiteLLM Proxy with a Vertex Gemini alias like this:
model_list:
  - model_name: gemini-25-flash-lite-dedicated
    litellm_params:
      model: vertex_ai/gemini-2.5-flash-lite
      vertex_project: os.environ/VERTEXAI_PROJECT
      vertex_location: os.environ/VERTEXAI_LOCATION
      vertex_credentials: os.environ/GOOGLE_APPLICATION_CREDENTIALS
      extra_headers:
        X-Vertex-AI-LLM-Request-Type: dedicated
        custom-llm-provider: vertex_ai

litellm_settings:
  return_response_headers: true
  1. Send a request through LiteLLM Proxy:

curl -sS
-D /tmp/headers.txt
-o /tmp/body.json
-X POST http://localhost:4000/v1/chat/completions
-H "Authorization: Bearer sk-local-litellm-dev"
-H "Content-Type: application/json"
--data '{"model":"gemini-25-flash-lite-dedicated","messages":[{"role":"user","content":"Hello, what model are you? Answer in one sentence."}],"temperature":0.6,"max_tokens":256}'

  1. Inspect the headers and body:

cat /tmp/headers.txt jq '.usage.extra_properties.google.traffic_type // .usageMetadata.trafficType // "not_found"' /tmp/body.json

  1. Observe that:
  • the request succeeds
  • the response body does not contain usage.extra_properties.google.traffic_type
  • the proxy headers do not contain a stable traffic type header
  1. Compare with Vertex OpenAI/Qwen routes, where LiteLLM does expose: usage.extra_properties.google.traffic_type

This also reproduces for my -no-custom-provider Gemini alias, so it does not seem specific to the custom-llm-provider header.

Relevant log output

=== Request ===
  URL: https://aiplatform.googleapis.com/v1/projects/<redacted-project>/locations/global/publishers/google/models/gemini-2.5-flash-lite:generateContent
  X-Vertex-AI-LLM-Request-Type: dedicated

  === Response Status ===
  HTTP 200

  === Response Headers ===
  HTTP/2 200
  x-vertex-ai-llm-request-type: dedicated
  content-type: application/json; charset=UTF-8
  vary: X-Origin
  vary: Referer
  vary: Origin,Accept-Encoding
  date: <redacted-timestamp>
  server: scaffolding on HTTPServer2
  x-xss-protection: 0
  x-frame-options: SAMEORIGIN
  x-content-type-options: nosniff
  alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000
  accept-ranges: none

  === Response Body ===
  {
    "candidates": [
      {
        "content": {
          "role": "model",
          "parts": [
            {
              "text": "I am a large language model, trained by Google."
            }
          ]
        },
        "finishReason": "STOP",
        "avgLogprobs": -0.00013389628888531163
      }
    ],
    "usageMetadata": {
      "promptTokenCount": 12,
      "candidatesTokenCount": 11,
      "totalTokenCount": 23,
      "trafficType": "PROVISIONED_THROUGHPUT",
      "promptTokensDetails": [
        {
          "modality": "TEXT",
          "tokenCount": 12
        }
      ],
      "candidatesTokensDetails": [
        {
          "modality": "TEXT",
          "tokenCount": 11
        }
      ]
    },
    "modelVersion": "gemini-2.5-flash-lite",
    "createTime": "<redacted-timestamp>",
    "responseId": "<redacted-response-id>"
  }

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.82.3

Twitter / LinkedIn details

No response

extent analysis

TL;DR

The issue can be resolved by modifying the LiteLLM Proxy configuration to consistently expose the Vertex traffic type in the public normalized response for Vertex Gemini models.

Guidance

  • Verify that the return_response_headers setting is enabled in the LiteLLM Proxy configuration to ensure that response headers are included in the output.
  • Check the LiteLLM Proxy logs to confirm that the traffic_type is being recorded internally, but not exposed in the normalized response.
  • Consider adding a custom callback to backfill the usage.extra_properties.google.traffic_type field, similar to the workaround already implemented.
  • Review the LiteLLM Proxy documentation to see if there are any configuration options or settings that can be adjusted to expose the traffic_type consistently for Vertex Gemini models.

Example

No code snippet is provided as the issue is related to configuration and not a specific code implementation.

Notes

The issue seems to be specific to Vertex Gemini models and not a general problem with LiteLLM Proxy. The workaround of adding a custom callback to backfill the usage.extra_properties.google.traffic_type field may be necessary until a more permanent solution is found.

Recommendation

Apply workaround: Add a custom callback to backfill the usage.extra_properties.google.traffic_type field, as this is the most straightforward solution to expose the traffic type consistently for Vertex Gemini models.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING