litellm - 💡(How to fix) Fix [Bug]: /v1/responses endpoint returns x-litellm-response-cost: 0 while spend_logs records correct cost [1 participants]

Q: Expected behavior

`x-litellm-response-cost` should return the actual computed cost for `/v1/responses`, just as it does for `/v1/chat/completions`.

litellm2026-04-25 03:07:26

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#26475•Fetched 2026-04-26 05:06:55

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Diesel-Chen

Participants

Diesel-Chen

Timeline (top)

labeled ×1

Root Cause

Root cause hypothesis

Code Example

curl -s -D- http://litellm:4000/v1/chat/completions \
  -H "Authorization: Bearer sk-..." \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-5.4","messages":[{"role":"user","content":"Say hello"}]}'

---

x-litellm-response-cost: 0.0011335    ✅ present & correct

---

curl -s -D- http://litellm:4000/v1/responses \
  -H "Authorization: Bearer sk-..." \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-5.4","input":"Say hello"}'

---

x-litellm-response-cost: 0            ❌ always 0

---

{
  "request_id": "resp_089ebd3558db3572...",
  "spend": 0.0399475,
  "model": "gpt-5.4"
}

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

The /v1/responses (OpenAI Responses API) endpoint returns x-litellm-response-cost: 0 in the response header, while the /v1/chat/completions endpoint returns the correct cost for the same model. Interestingly, the spend is correctly recorded in LiteLLM_SpendLogs — it's only the response header that's missing the cost.

This is similar to #23981 (which reports the same issue for /v1/messages), but affects the OpenAI Responses API path.

Environment

LiteLLM version: v1.82.3-stable (also reproduced on v1.83.7)
Edition: Community (Docker, ghcr.io/berriai/litellm:main-stable)
Deployment: Self-hosted Docker
Backend: OpenAI-compatible provider (Codex gateway)
Database: PostgreSQL

Reproduction

1. Call `/v1/chat/completions` (works correctly)

curl -s -D- http://litellm:4000/v1/chat/completions \
  -H "Authorization: Bearer sk-..." \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-5.4","messages":[{"role":"user","content":"Say hello"}]}'

Response headers (correct):

x-litellm-response-cost: 0.0011335    ✅ present & correct

2. Call `/v1/responses` (broken)

curl -s -D- http://litellm:4000/v1/responses \
  -H "Authorization: Bearer sk-..." \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-5.4","input":"Say hello"}'

Response headers (broken):

x-litellm-response-cost: 0            ❌ always 0

3. Spend logs DO record the correct cost

Querying /spend/logs confirms the cost was correctly computed and stored:

{
  "request_id": "resp_089ebd3558db3572...",
  "spend": 0.0399475,
  "model": "gpt-5.4"
}

Expected behavior

x-litellm-response-cost should return the actual computed cost for /v1/responses, just as it does for /v1/chat/completions.

Root cause hypothesis

The Responses API appears to go through a pass-through / different code path that doesn't call the same cost computation logic used by the Chat Completions handler before constructing the response headers. The cost IS computed asynchronously (hence it appears in spend_logs), but is not available at header-write time.

Additional note: request_id format mismatch

A secondary issue: for /v1/responses, the request_id stored in LiteLLM_SpendLogs uses a hex-encoded short format of the provider's response_id (e.g., resp_089ebd3558db3572...), while the actual provider response contains a longer Base64-like ID (e.g., resp_0Ci7hJsa-_bXKDXne0WxR...). This makes it impossible for downstream applications to look up spend logs by the provider's original response_id. For Chat Completions, the request_id is a standard UUID that's also returned in the x-litellm-call-id header, making lookups straightforward.

Relevant log/trace info

Tested on both v1.82.3-stable and v1.83.7 — same behavior
Non-streaming mode
store_prompts_in_spend_logs: true enabled in config

Are you a LiteLLM Enterprise user?

Twitter / LinkedIn details

No response

extent analysis

TL;DR

The x-litellm-response-cost header is not being populated correctly for the /v1/responses endpoint due to a potential code path difference in cost computation logic.

Guidance

Verify that the cost computation logic is being called for the /v1/responses endpoint and that the result is available when constructing the response headers.
Check if there are any differences in the code paths or handlers between the /v1/chat/completions and /v1/responses endpoints that could be causing the discrepancy.
Consider adding logging or debugging statements to track the flow of cost computation and header construction for both endpoints.
Review the LiteLLM_SpendLogs database table to ensure that the correct cost is being stored for /v1/responses requests.

Example

No code snippet is provided as the issue is more related to the logic and code flow rather than a specific code block.

Notes

The issue seems to be related to the asynchronous computation of the cost and the availability of the result when constructing the response headers. The fact that the cost is correctly stored in LiteLLM_SpendLogs suggests that the computation is working, but the result is not being propagated to the response headers.

Recommendation

Apply a workaround to ensure that the cost computation result is available when constructing the response headers for the /v1/responses endpoint, potentially by synchronizing the computation or caching the result.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

x-litellm-response-cost should return the actual computed cost for /v1/responses, just as it does for /v1/chat/completions.

#api #LLM response #serialization error #model compatibility #GPU setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - 💡(How to fix) Fix [Bug]: /v1/responses endpoint returns x-litellm-response-cost: 0 while spend_logs records correct cost [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Root cause hypothesis

Code Example

Check for existing issues

What happened?

Environment

Reproduction

1. Call `/v1/chat/completions` (works correctly)

2. Call `/v1/responses` (broken)

3. Spend logs DO record the correct cost

Expected behavior

Root cause hypothesis

Additional note: request_id format mismatch

Relevant log/trace info

Are you a LiteLLM Enterprise user?

Twitter / LinkedIn details

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

TRENDING

litellm - 💡(How to fix) Fix [Bug]: /v1/responses endpoint returns x-litellm-response-cost: 0 while spend_logs records correct cost [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Root cause hypothesis

Code Example

Check for existing issues

What happened?

Environment

Reproduction

1. Call /v1/chat/completions (works correctly)

2. Call /v1/responses (broken)

3. Spend logs DO record the correct cost

Expected behavior

Root cause hypothesis

Additional note: request_id format mismatch

Relevant log/trace info

Are you a LiteLLM Enterprise user?

Twitter / LinkedIn details

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING

1. Call `/v1/chat/completions` (works correctly)

2. Call `/v1/responses` (broken)