litellm - 💡(How to fix) Fix [Bug]: /v1/responses endpoint returns x-litellm-response-cost: 0 while spend_logs records correct cost [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#26475Fetched 2026-04-26 05:06:55
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Participants
Timeline (top)
labeled ×1

Root Cause

Root cause hypothesis

Code Example

curl -s -D- http://litellm:4000/v1/chat/completions \
  -H "Authorization: Bearer sk-..." \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-5.4","messages":[{"role":"user","content":"Say hello"}]}'

---

x-litellm-response-cost: 0.0011335    ✅ present & correct

---

curl -s -D- http://litellm:4000/v1/responses \
  -H "Authorization: Bearer sk-..." \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-5.4","input":"Say hello"}'

---

x-litellm-response-cost: 0            ❌ always 0

---

{
  "request_id": "resp_089ebd3558db3572...",
  "spend": 0.0399475,
  "model": "gpt-5.4"
}
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

The /v1/responses (OpenAI Responses API) endpoint returns x-litellm-response-cost: 0 in the response header, while the /v1/chat/completions endpoint returns the correct cost for the same model. Interestingly, the spend is correctly recorded in LiteLLM_SpendLogs — it's only the response header that's missing the cost.

This is similar to #23981 (which reports the same issue for /v1/messages), but affects the OpenAI Responses API path.

Environment

  • LiteLLM version: v1.82.3-stable (also reproduced on v1.83.7)
  • Edition: Community (Docker, ghcr.io/berriai/litellm:main-stable)
  • Deployment: Self-hosted Docker
  • Backend: OpenAI-compatible provider (Codex gateway)
  • Database: PostgreSQL

Reproduction

1. Call /v1/chat/completions (works correctly)

curl -s -D- http://litellm:4000/v1/chat/completions \
  -H "Authorization: Bearer sk-..." \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-5.4","messages":[{"role":"user","content":"Say hello"}]}'

Response headers (correct):

x-litellm-response-cost: 0.0011335    ✅ present & correct

2. Call /v1/responses (broken)

curl -s -D- http://litellm:4000/v1/responses \
  -H "Authorization: Bearer sk-..." \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-5.4","input":"Say hello"}'

Response headers (broken):

x-litellm-response-cost: 0            ❌ always 0

3. Spend logs DO record the correct cost

Querying /spend/logs confirms the cost was correctly computed and stored:

{
  "request_id": "resp_089ebd3558db3572...",
  "spend": 0.0399475,
  "model": "gpt-5.4"
}

Expected behavior

x-litellm-response-cost should return the actual computed cost for /v1/responses, just as it does for /v1/chat/completions.

Root cause hypothesis

The Responses API appears to go through a pass-through / different code path that doesn't call the same cost computation logic used by the Chat Completions handler before constructing the response headers. The cost IS computed asynchronously (hence it appears in spend_logs), but is not available at header-write time.

Additional note: request_id format mismatch

A secondary issue: for /v1/responses, the request_id stored in LiteLLM_SpendLogs uses a hex-encoded short format of the provider's response_id (e.g., resp_089ebd3558db3572...), while the actual provider response contains a longer Base64-like ID (e.g., resp_0Ci7hJsa-_bXKDXne0WxR...). This makes it impossible for downstream applications to look up spend logs by the provider's original response_id. For Chat Completions, the request_id is a standard UUID that's also returned in the x-litellm-call-id header, making lookups straightforward.

Relevant log/trace info

  • Tested on both v1.82.3-stable and v1.83.7 — same behavior
  • Non-streaming mode
  • store_prompts_in_spend_logs: true enabled in config

Are you a LiteLLM Enterprise user?

No

Twitter / LinkedIn details

No response

extent analysis

TL;DR

The x-litellm-response-cost header is not being populated correctly for the /v1/responses endpoint due to a potential code path difference in cost computation logic.

Guidance

  • Verify that the cost computation logic is being called for the /v1/responses endpoint and that the result is available when constructing the response headers.
  • Check if there are any differences in the code paths or handlers between the /v1/chat/completions and /v1/responses endpoints that could be causing the discrepancy.
  • Consider adding logging or debugging statements to track the flow of cost computation and header construction for both endpoints.
  • Review the LiteLLM_SpendLogs database table to ensure that the correct cost is being stored for /v1/responses requests.

Example

No code snippet is provided as the issue is more related to the logic and code flow rather than a specific code block.

Notes

The issue seems to be related to the asynchronous computation of the cost and the availability of the result when constructing the response headers. The fact that the cost is correctly stored in LiteLLM_SpendLogs suggests that the computation is working, but the result is not being propagated to the response headers.

Recommendation

Apply a workaround to ensure that the cost computation result is available when constructing the response headers for the /v1/responses endpoint, potentially by synchronizing the computation or caching the result.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

x-litellm-response-cost should return the actual computed cost for /v1/responses, just as it does for /v1/chat/completions.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING