litellm - ✅(Solved) Fix [Bug]: background Responses API stream resume on retrieve is not handled correctly via the proxy [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#26762Fetched 2026-04-30 06:20:17
View on GitHub
Comments
0
Participants
1
Timeline
3
Reactions
0
Author
Participants
Timeline (top)
labeled ×2cross-referenced ×1

Fix Action

Fixed

PR fix notes

PR #26750: feat(responses): support cursor-based stream resume on retrieve

Description (problem / solution / changelog)

Relevant issues

Related bug report: #26762

Follow-up to #26671, retargeted to the public OSS staging branch and updated to fix the synchronous retrieve streaming gap identified during review.

This PR adds support for OpenAI-style client.responses.retrieve(response_id, stream=True, starting_after=N) across LiteLLM's retrieve path.

Before this change:

  • the proxy retrieve endpoint forwarded stream / starting_after query params for provider-backed GET requests
  • the async retrieve path could open an SSE stream and return a ResponsesAPIStreamingIterator
  • the sync get_responses(stream=True) path still issued a normal GET and then treated the SSE body like a non-streaming JSON response

This PR closes that gap and adds regression coverage for the sync and proxy paths.

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link: after PR creation

  • CI run for the last commit
    Link: after final push

  • Merge / cherry-pick CI run
    Links: N/A (maintainer-owned)

Screenshots / Proof of Fix

Added regression coverage for:

  • async retrieve streaming returning a ResponsesAPIStreamingIterator
  • sync retrieve streaming returning a SyncResponsesAPIStreamingIterator
  • sync HTTPHandler.get(..., stream=True) opening an actual streaming GET request
  • proxy forwarding of stream / starting_after query params on GET /v1/responses/{response_id}
  • proxy validation for non-integer starting_after

Local targeted verification:

PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 .venv/bin/pytest -p pytest_asyncio.plugin \
  tests/test_litellm/responses/test_get_responses_stream_resume.py \
  tests/test_litellm/proxy/response_api_endpoints/test_endpoints.py \
  -x -vv

Result: 13 passed

Type

🆕 New Feature 🐛 Bug Fix

Changes

  • add sync streaming GET support to HTTPHandler.get()
  • return SyncResponsesAPIStreamingIterator from get_responses(..., stream=True) instead of treating the SSE body as a non-streaming response
  • preserve the existing async retrieve streaming behavior
  • add sync-path regression tests for responses retrieve stream resume
  • add proxy endpoint tests for query-param forwarding and invalid starting_after handling

Changed files

  • litellm/llms/azure/responses/transformation.py (modified, +15/-4)
  • litellm/llms/base_llm/responses/transformation.py (modified, +11/-0)
  • litellm/llms/custom_httpx/http_handler.py (modified, +81/-8)
  • litellm/llms/custom_httpx/llm_http_handler.py (modified, +119/-5)
  • litellm/llms/manus/responses/transformation.py (modified, +6/-0)
  • litellm/llms/openai/responses/transformation.py (modified, +11/-1)
  • litellm/llms/volcengine/responses/transformation.py (modified, +5/-0)
  • litellm/proxy/response_api_endpoints/endpoints.py (modified, +22/-0)
  • litellm/responses/main.py (modified, +35/-7)
  • tests/test_litellm/llms/custom_httpx/test_credential_leak_prevention.py (modified, +64/-0)
  • tests/test_litellm/proxy/response_api_endpoints/test_endpoints.py (modified, +81/-10)
  • tests/test_litellm/responses/test_get_responses_stream_resume.py (added, +279/-0)
RAW_BUFFERClick to expand / collapse

What happened?

I found this while using the OpenAI Responses API background mode through the LiteLLM proxy.

When resuming a background response via:

GET /v1/responses/{response_id}?stream=true&starting_after=<sequence_number>

LiteLLM did not correctly handle the retrieve streaming path.

I expected the proxy to support OpenAI-style cursor-based stream resume on the retrieve endpoint and return a valid SSE stream for background responses.

Steps to Reproduce

  1. Send a Responses API request through the LiteLLM proxy using OpenAI background mode.
  2. Wait until the response can be retrieved by response_id.
  3. Call GET /v1/responses/{response_id}?stream=true&starting_after=<sequence_number>.
  4. Observe that the retrieve streaming path is not handled correctly.

Relevant log output

No additional production logs attached.

I was able to reproduce this consistently through the proxy retrieve path and then narrowed it down to the retrieve streaming flow.

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on?

v1.83.14

Additional context

This was discovered while trying to use background mode together with streaming resume, which is a valid OpenAI Responses API workflow.

A fix has been proposed in PR #26750.

extent analysis

TL;DR

The issue can likely be resolved by applying the fix proposed in PR #26750 to handle the retrieve streaming path correctly in the LiteLLM proxy.

Guidance

  • The problem seems to stem from the LiteLLM proxy not correctly handling the streaming path when resuming a background response, which is a valid workflow in the OpenAI Responses API.
  • To verify the issue, follow the steps to reproduce provided, focusing on the GET /v1/responses/{response_id}?stream=true&starting_after=<sequence_number> call.
  • Applying the fix from PR #26750 should address the issue by ensuring the proxy supports OpenAI-style cursor-based stream resume on the retrieve endpoint.
  • Before applying any fixes, ensure you are on the correct version of LiteLLM (v1.83.14 or later) to avoid version conflicts.

Example

No specific code example is provided due to the nature of the issue, but applying the changes from PR #26750 should include modifications to how the LiteLLM proxy handles the starting_after parameter in streaming requests.

Notes

The fix proposed in PR #26750 is specific to handling the retrieve streaming path in the context of OpenAI background mode through the LiteLLM proxy. This solution assumes that the issue is isolated to this particular workflow and may not address similar issues in other parts of the system.

Recommendation

Apply the workaround by integrating the fix proposed in PR #26750, as it directly addresses the identified issue with the retrieve streaming path in the LiteLLM proxy.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - ✅(Solved) Fix [Bug]: background Responses API stream resume on retrieve is not handled correctly via the proxy [1 pull requests, 1 participants]