litellm - ✅(Solved) Fix SpendLogs not recorded for /v1/messages?beta=true streaming requests [3 pull requests, 9 comments, 3 participants]

cafonseca · 2026-03-09T12:58:36Z

[litellm] PR 23748: fix: wire streaming SpendLog logging for /v1/messages beta endpoint - Repository: BerriAI/litellm - Author: weiguangli-io - State: open | m… # PR #23748: fix: wire streaming SpendLog logging for /v1/messages beta endpoint - Repository: BerriAI/litellm - Author: weiguangli-io - State: open | merged: False - Link: https://github.com/BerriAI/litellm/pull/23748 ## Description (problem / solution / changelog) ## Summary Fixes #23150 — `/v1/messages` beta endpoint streaming doesn't log SpendLogs when `websearch_interception` callback is enabled. ### Root Cause When `websearch_interception` converts a streaming request to non-streaming (`stream=True` → `stream=False`) and no agentic loop runs (i.e., the LLM doesn't use the web search tool), the response is wrapped in `FakeAnthropicMessagesStreamIterator`. This iterator yields pre-built SSE chunks to the client but **never triggers the streaming logging handler**, so `async_success_handler` is never called and SpendLogs are missing. ### Fix Wrap `FakeAnthropicMessagesStreamIterator` with `BaseAnthropicMessagesStreamingIterator.async_sse_wrapper()`, which: 1. Iterates through all chunks from the fake stream 2. Collects them for logging 3. After the stream completes, calls `_handle_streaming_logging` → `_route_streaming_logging_to_handler` → `async_success_handler` This reuses the exact same logging machinery that real streaming responses use (e.g., Bedrock's `bedrock_sse_wrapper`), ensuring SpendLogs are created with correct spend, tokens, model, and user information. ### Context - @cafonseca isolated the issue to the `websearch_interception` callback — removing it from callbacks restored SpendLog entries - @JiwaniZakir was going to fix the main streaming path but [stepped back](https://github.com/BerriAI/litellm/issues/23150#issuecomment-2843131529) - The non-websearch streaming path already works correctly (the underlying `chunk_processor`/`async_sse_wrapper` handles logging) ## Test plan - [x] Added `tests/test_litellm/llms/custom_httpx/test_fake_stream_logging.py` with 3 tests: - `test_fake_stream_wrapped_with_logging_handler` — verifies the fake stream is wrapped as an async generator (not raw `FakeAnthropicMessagesStreamIterator`) - `test_fake_stream_logging_handler_called` — verifies `_handle_streaming_logging` is called after stream consumption - `test_no_websearch_conversion_returns_none` — verifies no wrapping when websearch conversion didn't happen - [x] All existing `test_llm_http_handler.py` tests pass (4/4) - [x] All existing `websearch_interception` tests pass (37/37, 1 skipped, 1 unrelated arch failure) ## Changed files - `litellm/llms/custom_httpx/llm_http_handler.py` (modified, +15/-1) - `tests/test_litellm/llms/custom_httpx/test_fake_stream_logging.py` (added, +173/-0) --- # PR #24135: fix(proxy): defer logging until post-call guardrails complete - Repository: BerriAI/litellm - Author: michelligabriele - State: closed | merged: True - Link: https://github.com/BerriAI/litellm/pull/24135 ## Description (problem / solution / changelog) guardrail_information is None in StandardLoggingPayload because logging fires before post-call guardrails write to metadata. Non-streaming: wrapper_async stores a closure instead of calling create_task immediately. The proxy fires it in a try/finally after post_call_success_hook so the SLP is built with guardrail info. Streaming: a closure on logging_obj is called by CSW.__anext__ at stream end. The closure runs only guardrail hooks (not all callbacks) on the assembled response, then fires both logging handlers. This avoids behavioral changes for non-guardrail callbacks on streaming. ## Relevant issues Replaces #23929 ## Pre-Submission checklist **Please complete all items before asking a LiteLLM maintainer to review your PR** - [x] I have Added testing in the [`tests/test_litellm/`](https://github.com/BerriAI/litellm/tree/main/tests/test_litellm) directory, **Adding at least 1 test is a hard requirement** - [see details](https://docs.litellm.ai/docs/extras/contributing_code) - [ ] My PR passes all unit tests on [`make test-unit`](https://docs.litellm.ai/docs/extras/contributing_code) - [x] My PR's scope is as isolated as possible, it only solves 1 specific problem - [ ] I have requested a Greptile review by commenting `@greptileai` and received a **Confidence Score of at least 4/5** before requesting a maintainer review ## Delays in PR merge? If you're seeing a delay in your PR being merged, ping the LiteLLM Team on [Slack (#pr-review)](https://join.slack.com/t/litellmossslack/shared_invite/zt-3o7nkuyfr-p_kbNJj8taRfXGgQI1~YyA). ## CI (LiteLLM team) > **CI status guideline:** > > - 50-55 passing tests: main is stable with minor issues. > - 45-49 passing tests: acceptable but needs attention > - <= 40 passing tests: unstable; be careful with your merges and assess the risk. - [ ] **Branch creation CI run** Link: - [ ] **CI run for the last commit** Link: - [ ] **Merge / cherry-pick CI run** Links: ##

litellm2026-03-09 12:58:36

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#23150•Fetched 2026-04-08 00:38:21

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

commented ×9subscribed ×7referenced ×5cross-referenced ×3

Fix Action

Workaround

None known. CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS=1 does not remove the ?beta=true query parameter when Claude Code detects the provider as non-Bedrock/non-Vertex (e.g., through a proxy).

PR fix notes

PR #23748: fix: wire streaming SpendLog logging for /v1/messages beta endpoint

Repository: BerriAI/litellm
Author: weiguangli-io
State: open | merged: False
Link: https://github.com/BerriAI/litellm/pull/23748

Description (problem / solution / changelog)

Summary

Fixes #23150 — /v1/messages beta endpoint streaming doesn't log SpendLogs when websearch_interception callback is enabled.

Root Cause

When websearch_interception converts a streaming request to non-streaming (stream=True → stream=False) and no agentic loop runs (i.e., the LLM doesn't use the web search tool), the response is wrapped in FakeAnthropicMessagesStreamIterator. This iterator yields pre-built SSE chunks to the client but never triggers the streaming logging handler, so async_success_handler is never called and SpendLogs are missing.

Fix

Wrap FakeAnthropicMessagesStreamIterator with BaseAnthropicMessagesStreamingIterator.async_sse_wrapper(), which:

Iterates through all chunks from the fake stream
Collects them for logging
After the stream completes, calls _handle_streaming_logging → _route_streaming_logging_to_handler → async_success_handler

This reuses the exact same logging machinery that real streaming responses use (e.g., Bedrock's bedrock_sse_wrapper), ensuring SpendLogs are created with correct spend, tokens, model, and user information.

Context

@cafonseca isolated the issue to the websearch_interception callback — removing it from callbacks restored SpendLog entries
@JiwaniZakir was going to fix the main streaming path but stepped back
The non-websearch streaming path already works correctly (the underlying chunk_processor/async_sse_wrapper handles logging)

Test plan

Added tests/test_litellm/llms/custom_httpx/test_fake_stream_logging.py with 3 tests:
- test_fake_stream_wrapped_with_logging_handler — verifies the fake stream is wrapped as an async generator (not raw FakeAnthropicMessagesStreamIterator)
- test_fake_stream_logging_handler_called — verifies _handle_streaming_logging is called after stream consumption
- test_no_websearch_conversion_returns_none — verifies no wrapping when websearch conversion didn't happen
All existing test_llm_http_handler.py tests pass (4/4)
All existing websearch_interception tests pass (37/37, 1 skipped, 1 unrelated arch failure)

Changed files

litellm/llms/custom_httpx/llm_http_handler.py (modified, +15/-1)
tests/test_litellm/llms/custom_httpx/test_fake_stream_logging.py (added, +173/-0)

PR #24135: fix(proxy): defer logging until post-call guardrails complete

Repository: BerriAI/litellm
Author: michelligabriele
State: closed | merged: True
Link: https://github.com/BerriAI/litellm/pull/24135

Description (problem / solution / changelog)

guardrail_information is None in StandardLoggingPayload because logging fires before post-call guardrails write to metadata.

Non-streaming: wrapper_async stores a closure instead of calling create_task immediately. The proxy fires it in a try/finally after post_call_success_hook so the SLP is built with guardrail info.

Streaming: a closure on logging_obj is called by CSW.anext at stream end. The closure runs only guardrail hooks (not all callbacks) on the assembled response, then fires both logging handlers. This avoids behavioral changes for non-guardrail callbacks on streaming.

Relevant issues

Replaces #23929

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

50-55 passing tests: main is stable with minor issues.

45-49 passing tests: acceptable but needs attention

<= 40 passing tests: unstable; be careful with your merges and assess the risk.

Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:

Type

🐛 Bug Fix

Changes

Problem

guardrail_information is always None in StandardLoggingPayload when post-call guardrails (e.g. OpenAI Moderation) are configured. This happens because:

Non-streaming: asyncio.create_task in wrapper_async (utils.py) fires the logging task before post_call_success_hook runs in base_process_llm_request, so the SLP is built before guardrails write to metadata.
Streaming: logging fires at stream exhaustion in CustomStreamWrapper.__anext__ without any guardrail data from the assembled response — post_call_success_hook is never called for streaming early-return routes.

Fix

Two deferral mechanisms — same concept (store a closure, call it at the right time), different execution points.

Non-streaming (utils.py + common_request_processing.py):

_has_post_call_guardrails() checks if any CustomGuardrail with post_call event hook is registered
If true and non-streaming: set logging_obj._defer_async_logging = True
wrapper_async sees the flag → stores closure on logging_obj._enqueue_deferred_logging instead of asyncio.create_task. Sync callbacks fire immediately (unchanged).
base_process_llm_request runs post_call_success_hook (guardrails write to metadata)
finally block calls the stored closure → create_task fires → SLP built with guardrail info

Streaming (common_request_processing.py + streaming_handler.py):

If _has_post_call_guardrails and response is CustomStreamWrapper: attach _on_deferred_stream_complete closure to logging_obj
The closure runs only guardrail hooks — iterates litellm.callbacks, filters for CustomGuardrail instances with post_call event hook, calls their async_post_call_success_hook. This is the same pattern ProxyLogging.post_call_success_hook uses internally, but filtered to guardrails only. Non-guardrail callbacks are not called (avoids behavioral changes for streaming).
CSW.__anext__ at stream end: checks for closure. If set, clears it and calls it via asyncio.create_task. If not set, fires logging directly (original behavior preserved).
Fallthrough safety: if code reaches the inline post_call_success_hook (no early return), the closure is cleared first to prevent double invocation.

Files

File	Change
`litellm/utils.py`	`_defer_async_logging` flag → store closure instead of `create_task`
`litellm/proxy/common_request_processing.py`	`_has_post_call_guardrails()` static method, deferral flag, streaming closure (guardrail-only), `try/finally`
`litellm/litellm_core_utils/streaming_handler.py`	`CSW.__anext__` checks for `_on_deferred_stream_complete`, calls it instead of logging directly
`tests/test_litellm/proxy/guardrails/test_deferred_guardrail_logging.py`	17 tests for both paths
`docs/my-website/docs/proxy/guardrails/custom_guardrail.md`	Document streaming `post_call` guardrails as audit-only

Tests (17 total)

Detection (7): _has_post_call_guardrails returns correct result for post_call, pre_call, event_hook=None, list event hooks, non-guardrail callbacks, empty callbacks

Non-streaming (3): deferred flag stores and executes closure, sync callbacks fire immediately, regression test without flag

Non-streaming exception (1): deferred logging fires even if guardrail raises HTTPException (try/finally)

Streaming (6): closure defers logging, regression without closure, closure runs only guardrail hooks (not all callbacks), guardrail-modified response flows to logging, exception resilience with guardrail_blocked, transient errors don't set guardrail_blocked, production closure integration test

Changed files

docs/my-website/docs/proxy/guardrails/custom_guardrail.md (modified, +10/-2)
docs/my-website/sidebars.js (modified, +14/-0)
litellm/litellm_core_utils/litellm_logging.py (modified, +14/-0)
litellm/litellm_core_utils/streaming_handler.py (modified, +25/-11)
litellm/proxy/_new_secret_config.yaml (modified, +25/-34)
litellm/proxy/common_request_processing.py (modified, +356/-104)
litellm/utils.py (modified, +31/-8)
poetry.lock (modified, +1/-1)
tests/test_litellm/litellm_core_utils/test_litellm_logging.py (modified, +76/-0)
tests/test_litellm/proxy/guardrails/test_deferred_guardrail_logging.py (added, +944/-0)

PR #26000: fix(proxy): handle non-standard SSE frames in Anthropic passthrough logging

Repository: BerriAI/litellm
Author: leikaiwei
State: open | merged: False
Link: https://github.com/BerriAI/litellm/pull/26000

Description (problem / solution / changelog)

Summary

Some third-party Anthropic-compatible API providers (e.g. AWS Bedrock proxies, custom gateways) emit non-standard SSE frames in their streaming responses:

OpenAI-style [DONE] sentinel frames mixed into Anthropic SSE streams
Non-JSON SSE lines (comments, keep-alive pings, debug output)

These cause json.JSONDecodeError in AnthropicPassthroughLoggingHandler._build_complete_streaming_response(), which breaks the entire logging pipeline — resulting in missing spend logs, incorrect token/cost accounting, and silent failures in the web UI.

Root Cause

In _build_complete_streaming_response(), the SSE event loop only catches StopIteration/StopAsyncIteration. When a non-standard frame arrives that isn't valid JSON, the unhandled JSONDecodeError propagates up and aborts logging for the entire streaming response.

Changes

Skip SSE events containing [DONE] control frames before attempting JSON parse
Catch json.JSONDecodeError for malformed SSE lines and continue to the next event

Both are minimal, defensive changes — valid Anthropic SSE events are unaffected.

Related Issues

Partially addresses #17476 (pass-through streaming callback failures)
Related to #23150 (spend logs not recorded for streaming requests)

Test Plan

Added unit tests for [DONE] frame handling
Added unit tests for non-JSON SSE line handling
Added test for mixed valid/invalid frames to verify valid events still get processed
All existing tests in test_anthropic_passthrough_logging_handler.py still pass

Changed files

litellm/proxy/pass_through_endpoints/llm_provider_handlers/anthropic_passthrough_logging_handler.py (modified, +13/-3)
tests/test_litellm/proxy/pass_through_endpoints/llm_provider_handlers/test_anthropic_passthrough_logging_handler.py (modified, +150/-65)

Code Example

/v1/messages?beta=true  → 6,230 requests (99%)
/v1/messages            →    59 requests (1%)

---

response_cost: 0.0044264
Async Wrapper: Completed Call, calling async_success_handler
Enters prisma db call, response_cost: 0.0044264, token: 51f67709...
Adding update to queue...

RAW_BUFFERClick to expand / collapse

Bug Description

Streaming requests to /v1/messages?beta=true complete successfully (200 OK) but never trigger the async_success_handler callback, resulting in no SpendLog entries, no Prometheus metrics, and no spend tracking for these requests.

Environment

LiteLLM Version: v1.81.14-stable (Docker image ghcr.io/berriai/litellm:v1.81.14-stable)
Client: Claude Code CLI v2.1.71 (sends ?beta=true on all /v1/messages requests)
Backend Provider: AWS Bedrock (anthropic models)
Deployment: Kubernetes/OpenShift with 2 LiteLLM pods

Evidence

1. Access logs vs SpendLogs gap (4-hour window)

Source	`/v1/messages`	`/chat/completions`
Access logs	11,904	12,827
SpendLogs	~3,519 (30%)	~13,442 (105%)

~70% of /v1/messages requests have no SpendLog entry.

2. 99% of `/v1/messages` requests have `?beta=true`

/v1/messages?beta=true  → 6,230 requests (99%)
/v1/messages            →    59 requests (1%)

The 1% without ?beta=true are logged correctly with full model, tokens, spend, and call_type.

3. Debug trace confirms `async_success_handler` never fires

With LITELLM_LOG=DEBUG enabled, we traced a single ?beta=true request:

12:49:15 - Request enters routing pipeline, auth passes, model alias resolved (claude-opus-4-6 → aws/claude-opus-4-6)
12:49:15 - litellm_pre_call_utils.py processes request correctly
12:49:16 - response_cost: 0.0 logged (initial pre-streaming entry)
Access log shows "POST /v1/messages?beta=true HTTP/1.1" 200 OK (streaming completed)
async_success_handler NEVER fires for this request
Enters prisma db call NEVER called for this request's token (e6a2217d...)

Meanwhile, other users' requests (without ?beta=true) on the same pod during the same time window show the complete lifecycle:

response_cost: 0.0044264
Async Wrapper: Completed Call, calling async_success_handler
Enters prisma db call, response_cost: 0.0044264, token: 51f67709...
Adding update to queue...

4. The few SpendLog entries that DO exist for `?beta=true` are incomplete

The rare entries that make it through have:

model = '' or model = 'claude-opus-4-6' (alias not resolved to bedrock model)
call_type = '' (empty)
spend = 0, total_tokens = 0, completion_tokens = 0, prompt_tokens = 0

Expected Behavior

/v1/messages?beta=true requests should trigger the same async_success_handler → update_database → SpendLog pipeline as /v1/messages requests without ?beta=true.

Steps to Reproduce

Configure LiteLLM with prometheus callback and Bedrock models
Send a streaming request to /v1/messages?beta=true with a valid model
Check LiteLLM_SpendLogs table — no entry created
Check Prometheus metrics — no increment for the request
Compare with /v1/messages (no ?beta=true) — works correctly

Impact

No cost tracking for Claude Code CLI users (which always sends ?beta=true)
No Prometheus metrics for these requests
Spend budgets not enforced since spend is never recorded
Admin UI dashboards incomplete — missing majority of anthropic messages traffic

Workaround

None known. CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS=1 does not remove the ?beta=true query parameter when Claude Code detects the provider as non-Bedrock/non-Vertex (e.g., through a proxy).

Related Issues

anthropics/claude-code#20031 — CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS not removing ?beta=true through proxies
anthropics/claude-code#30926 — Beta flags regression in Claude Code v2.1.69+
https://docs.litellm.ai/blog/claude-code-beta-headers-incident — Beta headers incident (fixed header forwarding, but not this logging gap)

extent analysis

Fix Plan

To fix the issue of async_success_handler not being triggered for /v1/messages?beta=true requests, we need to modify the code to handle the beta=true query parameter correctly. Here are the steps:

Modify the litellm_pre_call_utils.py file to handle the beta=true query parameter.
Add a check for the beta=true query parameter in the async_success_handler function to ensure it is triggered correctly.
Update the update_database function to correctly log the SpendLog entries for requests with beta=true.

Code Changes

# litellm_pre_call_utils.py
def process_request(request):
    # ... existing code ...
    if 'beta' in request.query_params and request.query_params['beta'] == 'true':
        # Handle beta=true query parameter
        request.model_alias = resolve_model_alias(request.model_alias)
    # ... existing code ...

# async_success_handler.py
def async_success_handler(request):
    # ... existing code ...
    if 'beta' in request.query_params and request.query_params['beta'] == 'true':
        # Trigger async_success_handler for beta=true requests
        update_database(request)
    # ... existing code ...

# update_database.py
def update_database(request):
    # ... existing code ...
    if 'beta' in request.query_params and request.query_params['beta'] == 'true':
        # Correctly log SpendLog entries for beta=true requests
        spend_log = SpendLog(
            model=request.model_alias,
            call_type='streaming',
            spend=request.response_cost,
            total_tokens=request.total_tokens,
            completion_tokens=request.completion_tokens,
            prompt_tokens=request.prompt_tokens
        )
        db.session.add(spend_log)
        db.session.commit()
    # ... existing code ...

Verification

To verify that the fix worked, send a streaming request to /v1/messages?beta=true and check the LiteLLM_SpendLogs table for a new entry. Also, check Prometheus metrics to ensure that the request is being tracked correctly.

Extra Tips

Make sure to test the fix thoroughly to ensure that it works correctly for all scenarios.
Consider adding logging statements to track the flow of the request and ensure that the async_success_handler is being triggered correctly.
Review the related issues and ensure that the fix does not introduce any new bugs or regressions.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #generation error #database connection #vector store #embedding generation #cache error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

litellm - ✅(Solved) Fix SpendLogs not recorded for /v1/messages?beta=true streaming requests [3 pull requests, 9 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Workaround

PR fix notes

PR #23748: fix: wire streaming SpendLog logging for /v1/messages beta endpoint

Description (problem / solution / changelog)

Summary

Root Cause

Fix

Context

Test plan

Changed files

PR #24135: fix(proxy): defer logging until post-call guardrails complete

Description (problem / solution / changelog)

Relevant issues

Pre-Submission checklist

Delays in PR merge?

CI (LiteLLM team)

Type

Changes

Problem

Fix

Files

Tests (17 total)

Changed files

PR #26000: fix(proxy): handle non-standard SSE frames in Anthropic passthrough logging

Description (problem / solution / changelog)

Summary

Root Cause

Changes

Related Issues

Test Plan

Changed files

Code Example

Bug Description

Environment

Evidence

1. Access logs vs SpendLogs gap (4-hour window)

2. 99% of /v1/messages requests have ?beta=true

3. Debug trace confirms async_success_handler never fires

4. The few SpendLog entries that DO exist for ?beta=true are incomplete

Expected Behavior

Steps to Reproduce

Impact

Workaround

Related Issues

extent analysis

Fix Plan

Code Changes

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING

2. 99% of `/v1/messages` requests have `?beta=true`

3. Debug trace confirms `async_success_handler` never fires

4. The few SpendLog entries that DO exist for `?beta=true` are incomplete