litellm - ✅(Solved) Fix SpendLogs not recorded for /v1/messages?beta=true streaming requests [3 pull requests, 9 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#23150Fetched 2026-04-08 00:38:21
View on GitHub
Comments
9
Participants
3
Timeline
29
Reactions
1
Author
Timeline (top)
commented ×9subscribed ×7referenced ×5cross-referenced ×3

Fix Action

Workaround

None known. CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS=1 does not remove the ?beta=true query parameter when Claude Code detects the provider as non-Bedrock/non-Vertex (e.g., through a proxy).

PR fix notes

PR #23748: fix: wire streaming SpendLog logging for /v1/messages beta endpoint

Description (problem / solution / changelog)

Summary

Fixes #23150 — /v1/messages beta endpoint streaming doesn't log SpendLogs when websearch_interception callback is enabled.

Root Cause

When websearch_interception converts a streaming request to non-streaming (stream=Truestream=False) and no agentic loop runs (i.e., the LLM doesn't use the web search tool), the response is wrapped in FakeAnthropicMessagesStreamIterator. This iterator yields pre-built SSE chunks to the client but never triggers the streaming logging handler, so async_success_handler is never called and SpendLogs are missing.

Fix

Wrap FakeAnthropicMessagesStreamIterator with BaseAnthropicMessagesStreamingIterator.async_sse_wrapper(), which:

  1. Iterates through all chunks from the fake stream
  2. Collects them for logging
  3. After the stream completes, calls _handle_streaming_logging_route_streaming_logging_to_handlerasync_success_handler

This reuses the exact same logging machinery that real streaming responses use (e.g., Bedrock's bedrock_sse_wrapper), ensuring SpendLogs are created with correct spend, tokens, model, and user information.

Context

  • @cafonseca isolated the issue to the websearch_interception callback — removing it from callbacks restored SpendLog entries
  • @JiwaniZakir was going to fix the main streaming path but stepped back
  • The non-websearch streaming path already works correctly (the underlying chunk_processor/async_sse_wrapper handles logging)

Test plan

  • Added tests/test_litellm/llms/custom_httpx/test_fake_stream_logging.py with 3 tests:
    • test_fake_stream_wrapped_with_logging_handler — verifies the fake stream is wrapped as an async generator (not raw FakeAnthropicMessagesStreamIterator)
    • test_fake_stream_logging_handler_called — verifies _handle_streaming_logging is called after stream consumption
    • test_no_websearch_conversion_returns_none — verifies no wrapping when websearch conversion didn't happen
  • All existing test_llm_http_handler.py tests pass (4/4)
  • All existing websearch_interception tests pass (37/37, 1 skipped, 1 unrelated arch failure)

Changed files

  • litellm/llms/custom_httpx/llm_http_handler.py (modified, +15/-1)
  • tests/test_litellm/llms/custom_httpx/test_fake_stream_logging.py (added, +173/-0)

PR #24135: fix(proxy): defer logging until post-call guardrails complete

Description (problem / solution / changelog)

guardrail_information is None in StandardLoggingPayload because logging fires before post-call guardrails write to metadata.

Non-streaming: wrapper_async stores a closure instead of calling create_task immediately. The proxy fires it in a try/finally after post_call_success_hook so the SLP is built with guardrail info.

Streaming: a closure on logging_obj is called by CSW.anext at stream end. The closure runs only guardrail hooks (not all callbacks) on the assembled response, then fires both logging handlers. This avoids behavioral changes for non-guardrail callbacks on streaming.

Relevant issues

Replaces #23929

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Type

🐛 Bug Fix

Changes

Problem

guardrail_information is always None in StandardLoggingPayload when post-call guardrails (e.g. OpenAI Moderation) are configured. This happens because:

  • Non-streaming: asyncio.create_task in wrapper_async (utils.py) fires the logging task before post_call_success_hook runs in base_process_llm_request, so the SLP is built before guardrails write to metadata.
  • Streaming: logging fires at stream exhaustion in CustomStreamWrapper.__anext__ without any guardrail data from the assembled response — post_call_success_hook is never called for streaming early-return routes.

Fix

Two deferral mechanisms — same concept (store a closure, call it at the right time), different execution points.

Non-streaming (utils.py + common_request_processing.py):

  1. _has_post_call_guardrails() checks if any CustomGuardrail with post_call event hook is registered
  2. If true and non-streaming: set logging_obj._defer_async_logging = True
  3. wrapper_async sees the flag → stores closure on logging_obj._enqueue_deferred_logging instead of asyncio.create_task. Sync callbacks fire immediately (unchanged).
  4. base_process_llm_request runs post_call_success_hook (guardrails write to metadata)
  5. finally block calls the stored closure → create_task fires → SLP built with guardrail info

Streaming (common_request_processing.py + streaming_handler.py):

  1. If _has_post_call_guardrails and response is CustomStreamWrapper: attach _on_deferred_stream_complete closure to logging_obj
  2. The closure runs only guardrail hooks — iterates litellm.callbacks, filters for CustomGuardrail instances with post_call event hook, calls their async_post_call_success_hook. This is the same pattern ProxyLogging.post_call_success_hook uses internally, but filtered to guardrails only. Non-guardrail callbacks are not called (avoids behavioral changes for streaming).
  3. CSW.__anext__ at stream end: checks for closure. If set, clears it and calls it via asyncio.create_task. If not set, fires logging directly (original behavior preserved).
  4. Fallthrough safety: if code reaches the inline post_call_success_hook (no early return), the closure is cleared first to prevent double invocation.

Files

FileChange
litellm/utils.py_defer_async_logging flag → store closure instead of create_task
litellm/proxy/common_request_processing.py_has_post_call_guardrails() static method, deferral flag, streaming closure (guardrail-only), try/finally
litellm/litellm_core_utils/streaming_handler.pyCSW.__anext__ checks for _on_deferred_stream_complete, calls it instead of logging directly
tests/test_litellm/proxy/guardrails/test_deferred_guardrail_logging.py17 tests for both paths
docs/my-website/docs/proxy/guardrails/custom_guardrail.mdDocument streaming post_call guardrails as audit-only

Tests (17 total)

Detection (7): _has_post_call_guardrails returns correct result for post_call, pre_call, event_hook=None, list event hooks, non-guardrail callbacks, empty callbacks

Non-streaming (3): deferred flag stores and executes closure, sync callbacks fire immediately, regression test without flag

Non-streaming exception (1): deferred logging fires even if guardrail raises HTTPException (try/finally)

Streaming (6): closure defers logging, regression without closure, closure runs only guardrail hooks (not all callbacks), guardrail-modified response flows to logging, exception resilience with guardrail_blocked, transient errors don't set guardrail_blocked, production closure integration test

Changed files

  • docs/my-website/docs/proxy/guardrails/custom_guardrail.md (modified, +10/-2)
  • docs/my-website/sidebars.js (modified, +14/-0)
  • litellm/litellm_core_utils/litellm_logging.py (modified, +14/-0)
  • litellm/litellm_core_utils/streaming_handler.py (modified, +25/-11)
  • litellm/proxy/_new_secret_config.yaml (modified, +25/-34)
  • litellm/proxy/common_request_processing.py (modified, +356/-104)
  • litellm/utils.py (modified, +31/-8)
  • poetry.lock (modified, +1/-1)
  • tests/test_litellm/litellm_core_utils/test_litellm_logging.py (modified, +76/-0)
  • tests/test_litellm/proxy/guardrails/test_deferred_guardrail_logging.py (added, +944/-0)

PR #26000: fix(proxy): handle non-standard SSE frames in Anthropic passthrough logging

Description (problem / solution / changelog)

Summary

Some third-party Anthropic-compatible API providers (e.g. AWS Bedrock proxies, custom gateways) emit non-standard SSE frames in their streaming responses:

  1. OpenAI-style [DONE] sentinel frames mixed into Anthropic SSE streams
  2. Non-JSON SSE lines (comments, keep-alive pings, debug output)

These cause json.JSONDecodeError in AnthropicPassthroughLoggingHandler._build_complete_streaming_response(), which breaks the entire logging pipeline — resulting in missing spend logs, incorrect token/cost accounting, and silent failures in the web UI.

Root Cause

In _build_complete_streaming_response(), the SSE event loop only catches StopIteration/StopAsyncIteration. When a non-standard frame arrives that isn't valid JSON, the unhandled JSONDecodeError propagates up and aborts logging for the entire streaming response.

Changes

  • Skip SSE events containing [DONE] control frames before attempting JSON parse
  • Catch json.JSONDecodeError for malformed SSE lines and continue to the next event

Both are minimal, defensive changes — valid Anthropic SSE events are unaffected.

Related Issues

  • Partially addresses #17476 (pass-through streaming callback failures)
  • Related to #23150 (spend logs not recorded for streaming requests)

Test Plan

  • Added unit tests for [DONE] frame handling
  • Added unit tests for non-JSON SSE line handling
  • Added test for mixed valid/invalid frames to verify valid events still get processed
  • All existing tests in test_anthropic_passthrough_logging_handler.py still pass

Changed files

  • litellm/proxy/pass_through_endpoints/llm_provider_handlers/anthropic_passthrough_logging_handler.py (modified, +13/-3)
  • tests/test_litellm/proxy/pass_through_endpoints/llm_provider_handlers/test_anthropic_passthrough_logging_handler.py (modified, +150/-65)

Code Example

/v1/messages?beta=true6,230 requests (99%)
/v1/messages            →    59 requests (1%)

---

response_cost: 0.0044264
Async Wrapper: Completed Call, calling async_success_handler
Enters prisma db call, response_cost: 0.0044264, token: 51f67709...
Adding update to queue...
RAW_BUFFERClick to expand / collapse

Bug Description

Streaming requests to /v1/messages?beta=true complete successfully (200 OK) but never trigger the async_success_handler callback, resulting in no SpendLog entries, no Prometheus metrics, and no spend tracking for these requests.

Environment

  • LiteLLM Version: v1.81.14-stable (Docker image ghcr.io/berriai/litellm:v1.81.14-stable)
  • Client: Claude Code CLI v2.1.71 (sends ?beta=true on all /v1/messages requests)
  • Backend Provider: AWS Bedrock (anthropic models)
  • Deployment: Kubernetes/OpenShift with 2 LiteLLM pods

Evidence

1. Access logs vs SpendLogs gap (4-hour window)

Source/v1/messages/chat/completions
Access logs11,90412,827
SpendLogs~3,519 (30%)~13,442 (105%)

~70% of /v1/messages requests have no SpendLog entry.

2. 99% of /v1/messages requests have ?beta=true

/v1/messages?beta=true  → 6,230 requests (99%)
/v1/messages            →    59 requests (1%)

The 1% without ?beta=true are logged correctly with full model, tokens, spend, and call_type.

3. Debug trace confirms async_success_handler never fires

With LITELLM_LOG=DEBUG enabled, we traced a single ?beta=true request:

  • 12:49:15 - Request enters routing pipeline, auth passes, model alias resolved (claude-opus-4-6aws/claude-opus-4-6)
  • 12:49:15 - litellm_pre_call_utils.py processes request correctly
  • 12:49:16 - response_cost: 0.0 logged (initial pre-streaming entry)
  • Access log shows "POST /v1/messages?beta=true HTTP/1.1" 200 OK (streaming completed)
  • async_success_handler NEVER fires for this request
  • Enters prisma db call NEVER called for this request's token (e6a2217d...)

Meanwhile, other users' requests (without ?beta=true) on the same pod during the same time window show the complete lifecycle:

response_cost: 0.0044264
Async Wrapper: Completed Call, calling async_success_handler
Enters prisma db call, response_cost: 0.0044264, token: 51f67709...
Adding update to queue...

4. The few SpendLog entries that DO exist for ?beta=true are incomplete

The rare entries that make it through have:

  • model = '' or model = 'claude-opus-4-6' (alias not resolved to bedrock model)
  • call_type = '' (empty)
  • spend = 0, total_tokens = 0, completion_tokens = 0, prompt_tokens = 0

Expected Behavior

/v1/messages?beta=true requests should trigger the same async_success_handlerupdate_database → SpendLog pipeline as /v1/messages requests without ?beta=true.

Steps to Reproduce

  1. Configure LiteLLM with prometheus callback and Bedrock models
  2. Send a streaming request to /v1/messages?beta=true with a valid model
  3. Check LiteLLM_SpendLogs table — no entry created
  4. Check Prometheus metrics — no increment for the request
  5. Compare with /v1/messages (no ?beta=true) — works correctly

Impact

  • No cost tracking for Claude Code CLI users (which always sends ?beta=true)
  • No Prometheus metrics for these requests
  • Spend budgets not enforced since spend is never recorded
  • Admin UI dashboards incomplete — missing majority of anthropic messages traffic

Workaround

None known. CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS=1 does not remove the ?beta=true query parameter when Claude Code detects the provider as non-Bedrock/non-Vertex (e.g., through a proxy).

Related Issues

  • anthropics/claude-code#20031 — CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS not removing ?beta=true through proxies
  • anthropics/claude-code#30926 — Beta flags regression in Claude Code v2.1.69+
  • https://docs.litellm.ai/blog/claude-code-beta-headers-incident — Beta headers incident (fixed header forwarding, but not this logging gap)

extent analysis

Fix Plan

To fix the issue of async_success_handler not being triggered for /v1/messages?beta=true requests, we need to modify the code to handle the beta=true query parameter correctly. Here are the steps:

  • Modify the litellm_pre_call_utils.py file to handle the beta=true query parameter.
  • Add a check for the beta=true query parameter in the async_success_handler function to ensure it is triggered correctly.
  • Update the update_database function to correctly log the SpendLog entries for requests with beta=true.

Code Changes

# litellm_pre_call_utils.py
def process_request(request):
    # ... existing code ...
    if 'beta' in request.query_params and request.query_params['beta'] == 'true':
        # Handle beta=true query parameter
        request.model_alias = resolve_model_alias(request.model_alias)
    # ... existing code ...

# async_success_handler.py
def async_success_handler(request):
    # ... existing code ...
    if 'beta' in request.query_params and request.query_params['beta'] == 'true':
        # Trigger async_success_handler for beta=true requests
        update_database(request)
    # ... existing code ...

# update_database.py
def update_database(request):
    # ... existing code ...
    if 'beta' in request.query_params and request.query_params['beta'] == 'true':
        # Correctly log SpendLog entries for beta=true requests
        spend_log = SpendLog(
            model=request.model_alias,
            call_type='streaming',
            spend=request.response_cost,
            total_tokens=request.total_tokens,
            completion_tokens=request.completion_tokens,
            prompt_tokens=request.prompt_tokens
        )
        db.session.add(spend_log)
        db.session.commit()
    # ... existing code ...

Verification

To verify that the fix worked, send a streaming request to /v1/messages?beta=true and check the LiteLLM_SpendLogs table for a new entry. Also, check Prometheus metrics to ensure that the request is being tracked correctly.

Extra Tips

  • Make sure to test the fix thoroughly to ensure that it works correctly for all scenarios.
  • Consider adding logging statements to track the flow of the request and ensure that the async_success_handler is being triggered correctly.
  • Review the related issues and ensure that the fix does not introduce any new bugs or regressions.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - ✅(Solved) Fix SpendLogs not recorded for /v1/messages?beta=true streaming requests [3 pull requests, 9 comments, 3 participants]