litellm - ✅(Solved) Fix [Bug]: Google-native :generateContent route silently drops correlation metadata (call_id, tags, user) from spend logs [3 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#25956Fetched 2026-04-18 05:52:52
View on GitHub
Comments
0
Participants
1
Timeline
6
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×3labeled ×1mentioned ×1subscribed ×1

The Google-native /v1beta/models/{model}:generateContent and /v1beta/models/{model}:streamGenerateContent routes silently drop all client-provided correlation metadata before it reaches LiteLLM_SpendLogs. Spend records are written with the correct token counts, cost, model, and timestamps — but every queryable correlation field is empty:

Client sendsSpend log fieldBehavior
x-litellm-call-id: <uuid> headerrequest_idSilently overridden by Gemini's response.id
x-litellm-tags: scan_id=<uuid> headerrequest_tagsSilently dropped ([])
user: <uuid> in request bodyuser / end_userSilently dropped ("")

Because none of these survive, callers cannot filter spend logs to attribute spend back to a specific scan, run, agent invocation, or end user. This blocks any per-call cost-attribution use case downstream of the proxy.

The same call patterns work correctly on /v1/chat/completions (OpenAI-compat) and /v1/messages (Anthropic-compat) on the same proxy — so this is a Google-native-specific gap, not a global misconfiguration.

Root Cause

Because none of these survive, callers cannot filter spend logs to attribute spend back to a specific scan, run, agent invocation, or end user. This blocks any per-call cost-attribution use case downstream of the proxy.

Fix Action

Fix / Workaround

Workaround until fixes land

  • PR #25500 — fixes defect #1 (outbound response headers)
  • Issue #24097 — documents defect #2 (streaming callbacks)
  • PR #25960 — fixes defect #2; supersedes #24114, closes #24097
  • PR #24114 — earlier fix attempt for #24097 by @awais786; closed by author same day, never merged. Superseded by #25960.
  • Issue #24945 — adjacent: litellm_metadata overridden on /v1/messages (same family of "non-OpenAI route loses metadata")
  • PR #24964 — earlier version of #25500 merged into a now-stale staging branch; superseded
  • PR #25955 — fixes defect #3 (this issue)
  • PR #25952 — fixes defect #4 (this issue)

PR fix notes

PR #25952: fix(proxy): honor client-supplied x-litellm-call-id in spend log request_id

Description (problem / solution / changelog)

Summary

When a client sends x-litellm-call-id: <my-uuid> explicitly, today the proxy silently discards it from the spend log: the spend record exists, but with an unrelated provider response id as request_id, so /spend/logs/v2?request_id=<my-uuid> returns empty.

Root cause

get_spend_logs_id (litellm/proxy/spend_tracking/spend_tracking_utils.py:166) does:

id = response_obj.get("id") or kwargs.get("litellm_call_id")

response_obj.id is the provider's own response id (e.g. Gemini's "QanhadapC_val7oP15PyuQM", OpenAI's "chatcmpl-...", Anthropic's "msg_..."). It's always present, so it always wins — litellm_call_id is never used to populate request_id in the spend log.

This is most painful on the Google-native :generateContent route, where the response also lacks x-litellm-* headers (see #25500), so clients can't even read back the resolved id from the response. But the underlying defect is cross-route.

Fix

Track whether litellm_call_id was supplied by the client via x-litellm-call-id (vs. an auto-generated UUID). When the client supplied it, prefer it over response.id for the spend log request_id. When the call_id is auto-generated, fall back to existing behavior (response.id first, then the auto-uuid).

This preserves backward compatibility for callers that don't set the header (existing tests at tests/test_keys.py:519, tests/test_spend_logs.py:115/181, etc., still assert request_id == response["id"]).

Files changed

  • litellm/proxy/common_request_processing.py — set data["litellm_call_id_from_client"] = True when the header is present (covers /v1/chat/completions, /v1/messages, etc.)
  • litellm/proxy/google_endpoints/endpoints.py — same flag for both :generateContent and :streamGenerateContent
  • litellm/proxy/spend_tracking/spend_tracking_utils.pyget_spend_logs_id honors the new flag
  • tests/test_litellm/proxy/spend_tracking/test_spend_tracking_utils.py — 6 new tests (TestGetSpendLogsId) covering both default and flagged precedence, plus the aretrieve_batch carve-out

Manual testing

# Without the fix
curl -X POST "https://<proxy>/v1beta/models/gemini-2.5-pro:generateContent" \
  -H "x-goog-api-key: sk-..." \
  -H "x-litellm-call-id: my-uuid-123" \
  -d '{"contents": [{"parts": [{"text": "hi"}], "role": "user"}]}'

curl "https://<proxy>/spend/logs/v2?request_id=my-uuid-123"
# → data: []  (BAD)

# With the fix
# → data: [{"request_id": "my-uuid-123", ...}]  (GOOD)

Tests

tests/test_litellm/proxy/spend_tracking/test_spend_tracking_utils.py::TestGetSpendLogsId::test_default_prefers_response_id PASSED
tests/test_litellm/proxy/spend_tracking/test_spend_tracking_utils.py::TestGetSpendLogsId::test_default_falls_back_to_litellm_call_id PASSED
tests/test_litellm/proxy/spend_tracking/test_spend_tracking_utils.py::TestGetSpendLogsId::test_client_supplied_flag_prefers_litellm_call_id PASSED
tests/test_litellm/proxy/spend_tracking/test_spend_tracking_utils.py::TestGetSpendLogsId::test_client_supplied_flag_falls_back_to_response_id PASSED
tests/test_litellm/proxy/spend_tracking/test_spend_tracking_utils.py::TestGetSpendLogsId::test_client_supplied_flag_false_uses_default_precedence PASSED
tests/test_litellm/proxy/spend_tracking/test_spend_tracking_utils.py::TestGetSpendLogsId::test_aretrieve_batch_unchanged PASSED

Test plan:

  • Add unit tests for new behavior
  • Verify existing tests asserting request_id == response["id"] are unaffected (default path unchanged)
  • Manual end-to-end on Google-native :generateContent route

Related

Part of a larger cluster of bugs in the Google-native /v1beta/... route (separate detailed issue forthcoming). Companion fixes:

  • #25500 (in flight) — outbound x-litellm-* response headers on Google-native routes
  • #24097 — streaming success_callback silently skipped on Google-native streaming
  • Forthcoming PR — metadata / user propagation into spend logs on Google-native non-streaming

Tracking issue

#25956 — full root-cause writeup for the Google-native correlation cluster (this PR is Fix #1 of 4 defects)

Changed files

  • litellm/proxy/common_request_processing.py (modified, +10/-3)
  • litellm/proxy/google_endpoints/endpoints.py (modified, +20/-6)
  • litellm/proxy/spend_tracking/spend_tracking_utils.py (modified, +6/-0)
  • tests/test_litellm/proxy/spend_tracking/test_spend_tracking_utils.py (modified, +93/-0)

PR #25955: fix(google_genai): propagate user from kwargs to logging obj in agenerate_content

Description (problem / solution / changelog)

Summary

The Google-native /v1beta/models/{model}:generateContent and /v1beta/models/{model}:streamGenerateContent routes silently drop the user field from the spend log. Spend records are written with the correct cost and tokens, but user: "" and end_user: "", blocking per-end-user attribution on this route.

Root cause

The Google-native handler in litellm/proxy/google_endpoints/endpoints.py reads user from the request body via add_litellm_data_to_request, which correctly populates data["user"]. That value flows through function_setup and then through llm_router.agenerate_content(**data) into litellm.agenerate_content, where kwargs["user"] is set.

But in setup_generate_content_call (litellm/google_genai/main.py:185), the call to litellm_logging_obj.update_from_kwargs doesn't pass user:

litellm_logging_obj.update_from_kwargs(
    kwargs=kwargs,
    model=model,
    optional_params=dict(generate_content_config_dict),
    litellm_params={
        "litellm_call_id": litellm_call_id,
    },
    custom_llm_provider=custom_llm_provider,
)

Without user=..., update_from_kwargs defers to its default user=None, so logging_obj.user and model_call_details["user"] are None, and the spend log row shows an empty user field.

OpenAI-compat /v1/chat/completions works because litellm.completion calls logging.update_environment_variables(user=user, ...) with the resolved user value (litellm/main.py:1597, 4806, 6325, 6496, 6746).

Fix

Pass user=kwargs.get("user") to update_from_kwargs so the logging object reflects what the client sent.

Files changed

  • litellm/google_genai/main.py — pass user through (one-line change)
  • tests/test_litellm/google_genai/test_google_genai_main.py — regression test using a real Logging instance with stubbed update_from_kwargs to verify user is propagated

Tests

Added test_setup_generate_content_call_propagates_user_to_logging_obj which:

  1. Constructs a real Logging instance (required by Pydantic on GenerateContentSetupResult)
  2. Stubs update_from_kwargs
  3. Calls setup_generate_content_call with user="my-end-user-uuid-456"
  4. Asserts update_from_kwargs.call_args.kwargs["user"] == "my-end-user-uuid-456"

Verified the test fails on main and passes with this commit.

tests/test_litellm/google_genai/test_google_genai_main.py::test_setup_generate_content_call_propagates_user_to_logging_obj PASSED

Manual testing

# Without the fix
curl -X POST "https://<proxy>/v1beta/models/gemini-2.5-pro:generateContent" \
  -H "x-goog-api-key: sk-..." \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{"parts": [{"text": "hi"}], "role": "user"}],
    "user": "my-end-user-uuid-456"
  }'

curl "https://<proxy>/spend/logs/v2?api_key=<hashed>"
# → {... "user": "", "end_user": ""}  (BAD)

# With the fix
# → {... "user": "my-end-user-uuid-456", "end_user": "my-end-user-uuid-456"}  (GOOD)

Related

Part of a larger cluster of bugs in the Google-native /v1beta/... route (separate detailed issue forthcoming). Companion fixes:

  • #25500 (in flight) — outbound x-litellm-* response headers on Google-native routes
  • #25952 — x-litellm-call-id precedence in spend log request_id
  • #24097 — streaming success_callback silently skipped on Google-native streaming

Note: tags propagation

The developer's repro report also noted request_tags: [] (empty) when sending x-litellm-tags. From code inspection, add_litellm_data_to_request and update_from_kwargs should propagate metadata["tags"] correctly through the agenerate_content path, so tags may already work — or there may be a separate downstream gap in how StandardLoggingPayload._get_request_tags reads from litellm_params.metadata for the agenerate_content call_type. That investigation is out of scope for this PR. The user fix here is independent and unblocks per-end-user attribution today; tags can be addressed in a follow-up.

Tracking issue

#25956 — full root-cause writeup for the Google-native correlation cluster (this PR is Fix #3 of 4 defects)

Changed files

  • litellm/google_genai/main.py (modified, +1/-0)
  • tests/test_litellm/google_genai/test_google_genai_main.py (modified, +65/-8)

PR #25960: fix(google_genai): route streaming chunks to GeminiPassthroughLoggingHandler so success_callbacks fire (closes #24097, supersedes #24114)

Description (problem / solution / changelog)

Summary

Fixes #24097.

Supersedes #24114 (closed by author without merging on 2026-03-19). This revival preserves the structural fix and incorporates Greptile's review feedback that was outstanding when the original was closed.

What was broken

For the Google-native streaming endpoints /models/{model}:streamGenerateContent and /v1beta/models/{model}:streamGenerateContent, the streaming iterator was tagging collected chunks as EndpointType.VERTEX_AI. The downstream _route_streaming_logging_to_handler had no VERTEX_AI branch that knew how to parse Google GenAI native chunks, so async_complete_streaming_response was never set and every function-based and CustomLogger success callback was silently skipped on stream end.

Sync callers were doubly broken: __next__ re-raised StopIteration without ever invoking the logging route at all.

The fix

  1. Add EndpointType.GOOGLE_GENAI = "google-genai" to the enum.
  2. In the async iterator, tag chunks as GOOGLE_GENAI, pass the real /models/{model}:streamGenerateContent URL, and forward the explicit model kwarg so downstream handlers do not have to fall back to URL parsing.
  3. In the sync iterator, mirror the async path on StopIteration so sync callers also receive callbacks. (This was the gap PR #24114 left open.)
  4. Add a GOOGLE_GENAI routing branch in _route_streaming_logging_to_handler that dispatches to GeminiPassthroughLoggingHandler._handle_logging_gemini_collected_chunks.

Tests

tests/test_litellm/proxy/pass_through_endpoints/llm_provider_handlers/test_google_genai_streaming_callbacks.py adds 7 regression tests, all runtime-behavior based:

TestWhat it covers
TestEndpointTypeEnum::test_google_genai_member_existsEnum has GOOGLE_GENAI = "google-genai"
TestEndpointTypeEnum::test_existing_endpoint_types_preservedSanity guard for VERTEX_AI/ANTHROPIC/OPENAI/GENERIC
test_async_iterator_routes_with_google_genai_endpoint_typeAsync iterator routes with right enum + URL + model
test_sync_iterator_routes_with_google_genai_endpoint_typeNew — sync iterator does the same (closes #24114 gap)
test_streaming_handler_routes_google_genai_to_gemini_handlerHandler dispatches GOOGLE_GENAI to Gemini handler with model kwarg
test_streaming_handler_does_not_route_vertex_ai_to_gemini_handlerRegression guard — VERTEX_AI still goes to Vertex handler
test_callbacks_actually_fire_for_google_genai_endpointEnd-to-end — uses real Logging instance and real CustomLogger subclass; verifies async_log_success_event actually fires (not just that the right code path was taken)

Verified locally: 6 of 7 fail on main without this commit, all 7 pass with it.

============================== 7 passed in 0.54s ===============================

Differences from PR #24114 (Greptile feedback addressed)

Greptile concernHow this revival addresses it
P0: pre-setting async_complete_streaming_response triggered early-return guardNot pre-set; relies on async_success_handler's own pass-through branch to set it.
P1: end-to-end test mocked async_success_handler, hiding the regressionNew test_callbacks_actually_fire_for_google_genai_endpoint uses a real Logging instance and a real CustomLogger subclass — actually exercises the dispatch.
P2: dead-code assertion message tuple mock.assert_called_once(), ("msg")Replaced with positional assert mock.call_count == 1, "msg".
P2: inspect.getsource would falsely fail on a comment containing VERTEX_AIReplaced with runtime-behavior tests that drive iteration end-to-end and inspect captured kwargs.
P2: sys.path.insert working-directory hackRemoved. The standard pytest config makes litellm importable.
P2: raw_bytes type mismatch (bytes vs List[bytes])Test stubs use List[bytes] matching the real signature.
P2: missing model kwarg in iterator → handler callIterator now passes model=self.model explicitly; handler test asserts forwarding.
P2: hardcoded url_route="/v1/generateContent" did not match /models/{model}:streamGenerateContentIterator now uses f"/models/{self.model}:streamGenerateContent".
Functional gap (not raised in #24114, but real): sync iterator never invoked logging on StopIterationFixed. New _handle_sync_streaming_logging mirrors the async path.

Also dropped unrelated formatting-only changes that were carried in the original PR (team_endpoints.py, litellm_logging.py) to keep this PR's scope focused.

Files changed

  • litellm/types/passthrough_endpoints/pass_through_endpoints.py (+1 enum value)
  • litellm/google_genai/streaming_iterator.py (refactored async logging into _build_logging_kwargs; new sync logging path; switched to GOOGLE_GENAI)
  • litellm/proxy/pass_through_endpoints/streaming_handler.py (added GOOGLE_GENAI routing branch + import)
  • tests/test_litellm/proxy/pass_through_endpoints/llm_provider_handlers/test_google_genai_streaming_callbacks.py (new file, 7 tests)

Credits

Original structural approach by @awais786 in #24114. Sync iterator fix, test rewrite, and end-to-end callback-fires test added in this revival.

Tracking

Part of the Google-native correlation cluster — see #25956 for the unified writeup. Companion PRs:

  • #25500 — outbound x-litellm-* response headers (in flight)
  • #25952 — x-litellm-call-id precedence in spend log request_id (this issue's #25956 Fix #1)
  • #25955 — user propagation in agenerate_content logging (this issue's #25956 Fix #3)

Changed files

  • litellm/google_genai/streaming_iterator.py (modified, +58/-12)
  • litellm/proxy/pass_through_endpoints/streaming_handler.py (modified, +20/-0)
  • litellm/types/passthrough_endpoints/pass_through_endpoints.py (modified, +1/-0)
  • tests/test_litellm/proxy/pass_through_endpoints/llm_provider_handlers/test_google_genai_streaming_callbacks.py (added, +397/-0)

Code Example

# Send a Google-native call with full correlation metadata
curl -X POST "https://<proxy>/v1beta/models/gemini-2.5-pro:generateContent" \
  -H "x-goog-api-key: sk-..." \
  -H "x-litellm-call-id: my-correlation-uuid-123" \
  -H "x-litellm-tags: scan_id=abc,run=42" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{"parts": [{"text": "hello"}], "role": "user"}],
    "user": "my-user-uuid-456"
  }'

# Try to find the spend record by any correlation field
curl "https://<proxy>/spend/logs/v2?request_id=my-correlation-uuid-123"
# → data: []  (empty)

# Find it by api_key + time window — record exists
curl "https://<proxy>/spend/logs/v2?api_key=<hashed>&start_date=...&end_date=..."
# → record found, but with:
#   request_id: "QanhadapC_val7oP15PyuQM"Gemini's auto-id, not ours
#   request_tags: []                          ← dropped
#   user: ""                                  ← dropped
#   end_user: ""                              ← dropped

---

┌──────────────────────────────────────────────┐
                 │ google_endpoints/endpoints.py bypasses        │
                 │ base_process_llm_request                      │
                 └───────────────┬──────────────────────────────┘
        ┌────────────────────────┼────────────────────────┬───────────────────────┐
        ▼                        ▼                        ▼                       ▼
 outbound headers         streaming callbacks       inbound metadata→        call_id precedence
 (x-litellm-*)            silently skipped          spend log dropped        in spend log
                                                    (tags/user/etc.)         (Gemini id wins)
        │                        │                        │                       │
        ▼                        ▼                        ▼                       ▼
   PR #25500                PR #25960                   PR #25955          PR #25952
   (open, ready             (open, supersedes           (open)             (open)
   for re-review)           closed #24114,
                            closes #24097)

---

id = response_obj.get("id") or kwargs.get("litellm_call_id")

---

GET /spend/logs/v2?api_key=<hashed-per-scan-key>&start_date=<scan_start>&end_date=<scan_end>
RAW_BUFFERClick to expand / collapse

Summary

The Google-native /v1beta/models/{model}:generateContent and /v1beta/models/{model}:streamGenerateContent routes silently drop all client-provided correlation metadata before it reaches LiteLLM_SpendLogs. Spend records are written with the correct token counts, cost, model, and timestamps — but every queryable correlation field is empty:

Client sendsSpend log fieldBehavior
x-litellm-call-id: <uuid> headerrequest_idSilently overridden by Gemini's response.id
x-litellm-tags: scan_id=<uuid> headerrequest_tagsSilently dropped ([])
user: <uuid> in request bodyuser / end_userSilently dropped ("")

Because none of these survive, callers cannot filter spend logs to attribute spend back to a specific scan, run, agent invocation, or end user. This blocks any per-call cost-attribution use case downstream of the proxy.

The same call patterns work correctly on /v1/chat/completions (OpenAI-compat) and /v1/messages (Anthropic-compat) on the same proxy — so this is a Google-native-specific gap, not a global misconfiguration.

Affected version

Reproduced on v1.83.3. Code inspection confirms the bugs exist in main as of this filing.

Reproduction

# Send a Google-native call with full correlation metadata
curl -X POST "https://<proxy>/v1beta/models/gemini-2.5-pro:generateContent" \
  -H "x-goog-api-key: sk-..." \
  -H "x-litellm-call-id: my-correlation-uuid-123" \
  -H "x-litellm-tags: scan_id=abc,run=42" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{"parts": [{"text": "hello"}], "role": "user"}],
    "user": "my-user-uuid-456"
  }'

# Try to find the spend record by any correlation field
curl "https://<proxy>/spend/logs/v2?request_id=my-correlation-uuid-123"
# → data: []  (empty)

# Find it by api_key + time window — record exists
curl "https://<proxy>/spend/logs/v2?api_key=<hashed>&start_date=...&end_date=..."
# → record found, but with:
#   request_id: "QanhadapC_val7oP15PyuQM"   ← Gemini's auto-id, not ours
#   request_tags: []                          ← dropped
#   user: ""                                  ← dropped
#   end_user: ""                              ← dropped

Root cause: Google-native bypass of standard wiring

All three failures share one root cause: the Google-native handlers in litellm/proxy/google_endpoints/endpoints.py bypass the base_process_llm_request codepath that every other proxy route uses. That codepath is where LiteLLM normally:

  1. Wires response headers (x-litellm-*) so clients can read back what call_id was used
  2. Sets up streaming logging hooks (async_complete_streaming_response)
  3. Builds the StandardLoggingPayload from data["metadata"] / data["user"]
  4. Resolves the spend log request_id from litellm_call_id (with response.id as fallback)

Because the Google-native handler does its own thing, each of those wirings has to be re-implemented manually — and the original implementation is incomplete.

                 ┌──────────────────────────────────────────────┐
                 │ google_endpoints/endpoints.py bypasses        │
                 │ base_process_llm_request                      │
                 └───────────────┬──────────────────────────────┘
        ┌────────────────────────┼────────────────────────┬───────────────────────┐
        ▼                        ▼                        ▼                       ▼
 outbound headers         streaming callbacks       inbound metadata→        call_id precedence
 (x-litellm-*)            silently skipped          spend log dropped        in spend log
                                                    (tags/user/etc.)         (Gemini id wins)
        │                        │                        │                       │
        ▼                        ▼                        ▼                       ▼
   PR #25500                PR #25960                   PR #25955          PR #25952
   (open, ready             (open, supersedes           (open)             (open)
   for re-review)           closed #24114,
                            closes #24097)

The four related defects

1. Outbound x-litellm-* response headers missing — TRACKED

Already addressed by PR #25500 ("feat(proxy): LiteLLM headers on Google native generateContent routes"). Adds build_litellm_proxy_success_headers_from_llm_response and wires it into both streaming and non-streaming Google-native routes. Greptile rated it 5/5; just awaiting maintainer re-review.

2. Streaming success callbacks silently skipped — TRACKED (Fix #2)

Documented as Issue #24097 with a thorough root-cause writeup: model_call_details["async_complete_streaming_response"] is never set on the Google-native streaming path, so function-based success_callback entries are silently skipped inside async_success_handler.

An initial fix attempt was made in PR #24114 ("fix(google-genai): route streaming chunks to GeminiPassthroughLoggingHandler") by @awais786 — closed by the author the same day (2026-03-19) without merging, with several Greptile review concerns outstanding.

Fix proposed in PR #25960 — supersedes #24114 with the structural approach preserved (EndpointType.GOOGLE_GENAI enum, streaming_iterator routing change, GOOGLE_GENAI branch in _route_streaming_logging_to_handler) and incorporates Greptile's outstanding feedback:

  • Removed inspect.getsource source-text tests in favor of runtime-behavior tests
  • Removed sys.path manipulation hack
  • Fixed dead-code assertion-message-as-tuple pattern
  • Iterator now passes model explicitly + uses real streamGenerateContent URL
  • Added end-to-end test using a real Logging instance + real CustomLogger subclass (the original mocked async_success_handler itself, masking the regression)
  • Plus a functional gap PR #24114 left open: sync iterator's __next__ never invoked the logging route on StopIteration, so sync callers were silently broken even with the async-side fix. PR #25960 mirrors the async path on the sync side.

3. Inbound metadata / user / tags not propagated to spend log — NEW (Fix #3)

add_litellm_data_to_request correctly populates data["metadata"]["tags"] (from x-litellm-tags header) and data["user"] (from request body) — verified by reading the function. But these fields don't survive into the StandardLoggingPayload that the spend-log writer consumes. The agenerate_content codepath re-parses kwargs through GenericLiteLLMParams and routes through a different logging path than acompletion, where the propagation never happens.

Fix proposed in PR #25955.

4. litellm_call_id overridden by response_obj.id in spend log — NEW (Fix #1)

google_endpoints/endpoints.py:61-63 correctly reads x-litellm-call-id into data["litellm_call_id"]. But get_spend_logs_id in litellm/proxy/spend_tracking/spend_tracking_utils.py:172-174 does:

id = response_obj.get("id") or kwargs.get("litellm_call_id")

Gemini's response always includes its own id field, so it always wins — the client's call_id is silently discarded. This is technically a cross-route bug (any provider that returns its own id would override a client-supplied call_id), but it surfaces most painfully on Google-native because Fix #1 from PR #25500 isn't yet merged (clients can't read the call_id back from the response either, so they can't recover it).

Fix proposed in PR #25952.

Workaround until fixes land

If the proxy operator can dedicate one virtual API key per scan/worker/run, then api_key + time_window becomes a reliable correlation pair on the Google-native route:

GET /spend/logs/v2?api_key=<hashed-per-scan-key>&start_date=<scan_start>&end_date=<scan_end>

This is the only pattern we found that survives the Google-native logging path today.

Cross-references

  • PR #25500 — fixes defect #1 (outbound response headers)
  • Issue #24097 — documents defect #2 (streaming callbacks)
  • PR #25960 — fixes defect #2; supersedes #24114, closes #24097
  • PR #24114 — earlier fix attempt for #24097 by @awais786; closed by author same day, never merged. Superseded by #25960.
  • Issue #24945 — adjacent: litellm_metadata overridden on /v1/messages (same family of "non-OpenAI route loses metadata")
  • PR #24964 — earlier version of #25500 merged into a now-stale staging branch; superseded
  • PR #25955 — fixes defect #3 (this issue)
  • PR #25952 — fixes defect #4 (this issue)

extent analysis

TL;DR

  • The most likely fix involves implementing the missing correlation metadata propagation in the Google-native handlers, as proposed in PR #25955 and PR #25952.

Guidance

  • Review and merge PR #25955 to fix the inbound metadata propagation issue.
  • Review and merge PR #25952 to fix the litellm_call_id override issue.
  • Verify that the fixes resolve the correlation metadata issues by testing the Google-native routes with the updated code.
  • Consider implementing a temporary workaround using dedicated virtual API keys per scan/worker/run, as described in the issue, until the fixes are fully deployed.

Example

  • No code snippet is provided, as the issue is complex and requires a thorough review of the proposed PRs.

Notes

  • The issue is specific to the Google-native handlers and does not affect other proxy routes.
  • The proposed fixes are still in the review process, and additional testing may be necessary to ensure their correctness.

Recommendation

  • Apply the workaround using dedicated virtual API keys per scan/worker/run until the fixes are fully deployed and verified, as this provides a reliable correlation pair on the Google-native route.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING