litellm - ✅(Solved) Fix [Bug]: Polling Adds Duplicate Costs to Key Budget [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#25015Fetched 2026-04-08 02:35:06
View on GitHub
Comments
1
Participants
2
Timeline
7
Reactions
0
Timeline (top)
cross-referenced ×3labeled ×3commented ×1

Root Cause

  1. Check key spend again
curl -X GET "http://localhost:4000/key/info?key=sk-test-key" \
  -H "Authorization: Bearer sk-master-1234"

Now the spend value becomes ($0.04, 4 times the cost because we polled 3 extra times)

Fix Action

Fixed

PR fix notes

PR #25061: fix(proxy): make responses polling retrieval non-billable (#25015)

Description (problem / solution / changelog)

Relevant issues

Fixes #25015

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR.

  • I have added testing in the tests/test_litellm/ directory, adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a confidence score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you are seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Type

Bug Fix
Test

Changes

  • Fixes duplicate spend tracking for OpenAI Responses polling retrieval calls.
  • Treats retrieval call types get_responses and aget_responses as non-billable in proxy cost tracking.
  • Forces response_cost to 0.0 for these retrieval calls before spend counters and database updates are applied.
  • Adds regression coverage in proxy hook tests to verify retrieval polling does not increase spend.
  • Keeps the PR scope focused on Issue #25015 only.
  • Local validation: ruff check passed for changed files and the targeted proxy hook test file passed.

Changed files

  • litellm/proxy/hooks/proxy_track_cost_callback.py (modified, +27/-5)
  • tests/test_litellm/proxy/hooks/test_proxy_track_cost_callback.py (modified, +57/-5)

Code Example

curl -X GET "http://localhost:4000/key/info?key=sk-test-key" \
  -H "Authorization: Bearer sk-master-1234"

---

curl -X POST http://localhost:4000/v1/responses \
  -H "Authorization: Bearer sk-test-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "o3-deep-research",
    "input": "What are the latest advances in quantum computing?"
  }'

---

curl -X GET "http://localhost:4000/key/info?key=sk-test-key" \
  -H "Authorization: Bearer sk-master-1234"

---

# Poll 1
curl -X GET http://localhost:4000/v1/responses/resp_abc123 \
  -H "Authorization: Bearer sk-test-key"

# Poll 2
curl -X GET http://localhost:4000/v1/responses/resp_abc123 \
  -H "Authorization: Bearer sk-test-key"

# Poll 3
curl -X GET http://localhost:4000/v1/responses/resp_abc123 \
  -H "Authorization: Bearer sk-test-key"

---

curl -X GET "http://localhost:4000/key/info?key=sk-test-key" \
  -H "Authorization: Bearer sk-master-1234"

---
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

When you use the OpenAI Responses API through LiteLLM's proxy (e.g., POST /v1/responses), and then poll for results via GET /v1/responses/{resp_id}, every successful GET poll triggers the full cost-tracking pipeline, incrementing the key/user/team/org spend each time - even though OpenAI doesn't charge for retrieval GETs.

Steps to Reproduce

  1. Check initial key spend:
curl -X GET "http://localhost:4000/key/info?key=sk-test-key" \
  -H "Authorization: Bearer sk-master-1234"

Note the spend value, e.g. $0.0.

  1. Create a response using the Responses API
curl -X POST http://localhost:4000/v1/responses \
  -H "Authorization: Bearer sk-test-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "o3-deep-research",
    "input": "What are the latest advances in quantum computing?"
  }'

Save the returned id (e.g. resp_abc123...).

  1. Check key spend after the initial request
curl -X GET "http://localhost:4000/key/info?key=sk-test-key" \
  -H "Authorization: Bearer sk-master-1234"

Note the spend value (e.g. $0.01)

  1. Poll the same response multiple times via GET
# Poll 1
curl -X GET http://localhost:4000/v1/responses/resp_abc123 \
  -H "Authorization: Bearer sk-test-key"

# Poll 2
curl -X GET http://localhost:4000/v1/responses/resp_abc123 \
  -H "Authorization: Bearer sk-test-key"

# Poll 3
curl -X GET http://localhost:4000/v1/responses/resp_abc123 \
  -H "Authorization: Bearer sk-test-key"
  1. Check key spend again
curl -X GET "http://localhost:4000/key/info?key=sk-test-key" \
  -H "Authorization: Bearer sk-master-1234"

Now the spend value becomes ($0.04, 4 times the cost because we polled 3 extra times)

Expected behavior

The spend should remain at $0.01. OpenAI does not charge for GET /v1/responses/{id} — it's a simple retrieval of an already-completed response. The key spend should only reflect the original POST request.

Relevant log output

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.81.14-stable

Twitter / LinkedIn details

No response

extent analysis

TL;DR

Modify the cost-tracking pipeline to exclude GET requests for retrieving existing responses.

Guidance

  • Identify the specific part of the cost-tracking pipeline that increments the key/user/team/org spend for GET requests and modify it to only increment for POST requests or other chargeable actions.
  • Verify that the modification does not affect other parts of the system that rely on accurate spend tracking.
  • Consider adding a conditional check in the cost-tracking pipeline to exclude GET requests for retrieving existing responses, based on the API endpoint or request method.
  • Review the OpenAI API documentation to ensure that the cost-tracking pipeline aligns with their pricing model.

Example

# Pseudo-code example of a conditional check
if request_method == 'POST' or endpoint != '/v1/responses/{id}':
    # Increment spend
    update_spend()

Notes

The exact implementation details may vary depending on the underlying architecture and technology stack of the LiteLLM proxy.

Recommendation

Apply a workaround by modifying the cost-tracking pipeline to exclude GET requests for retrieving existing responses, as this is a more targeted solution that addresses the specific issue at hand.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

The spend should remain at $0.01. OpenAI does not charge for GET /v1/responses/{id} — it's a simple retrieval of an already-completed response. The key spend should only reflect the original POST request.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING