litellm - ✅(Solved) Fix [Bug]: Session continuation only works after 10 secs on non OpenAI providers [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#25289Fetched 2026-04-08 03:02:06
View on GitHub
Comments
0
Participants
1
Timeline
3
Reactions
0
Author
Participants
Timeline (top)
labeled ×3

Root Cause

  1. Make a Responses request with for example a Gemini model
  2. Keep that response ID and before 10 secs try to keep the conversation
  3. Fail
  4. Repeat with OpenAI
  5. Works (because OpenAI store messages in her side)

PR fix notes

PR #13339: Fix: Responses API Redis session timing for conversation context

Description (problem / solution / changelog)

Fix: Responses API Redis session timing for conversation context

Description

This PR fixes a critical timing issue in the Responses API where conversation context was not being maintained between requests when using Redis cache. The issue was caused by batch processing delays in storing session data.

Problem

When using the Responses API with previous_response_id, the conversation context was not available immediately for subsequent requests. This was particularly problematic for:

  • Multi-turn conversations requiring context
  • High-throughput applications needing immediate session availability
  • ChatGPT-like interfaces requiring seamless conversation flow

Solution

Added a Redis-first session storage mechanism that:

  1. Stores session data immediately after response generation
  2. Retrieves session data directly from Redis on subsequent requests
  3. Falls back gracefully to existing enterprise/database logic if Redis is unavailable

Changes

  • Added _patch_store_session_in_redis() method for immediate session storage
  • Added _patch_get_session_from_redis() method for fast session retrieval
  • Modified pre_completion_batch_processing() to check Redis first
  • All changes are backward compatible and fail gracefully

Testing

Tested with multiple providers:

  • ✅ Anthropic Claude 3.5 Sonnet
  • ✅ DeepSeek Chat
  • ✅ Google Gemini 2.0 Flash

Test Script

# First request
response1 = requests.post("/v1/responses", json={
    "model": "claude-3-5-sonnet",
    "input": "My name is Alice",
    "max_tokens": 100
})
response_id = response1.json()["id"]

# Second request with context
response2 = requests.post("/v1/responses", json={
    "model": "claude-3-5-sonnet",
    "input": "What's my name?",
    "previous_response_id": response_id,
    "max_tokens": 100
})
# Now correctly returns "Alice" instead of "I don't know"

Configuration Required

litellm_settings:
  cache: true  # Required
  cache_params:
    type: "redis"
    host: "localhost"
    port: 6379

Related Issues

  • Fixes #12364 - Responses API conversation context timing issue
  • Related to #12640 - Native OpenAI Responses API support

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature
  • Breaking change
  • Documentation update

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have tested with multiple providers
  • Changes are backward compatible
  • Fails gracefully when Redis is not available

Additional Notes

This is implemented as a patch with clear comments indicating it's a temporary fix. The upstream team may want to integrate this more deeply into the core session handling logic. The patch:

  • Only activates when Redis cache is configured
  • Does not interfere with existing enterprise session handlers
  • Adds minimal overhead (single Redis get/set operation)
  • Uses 24-hour TTL for session data

Changed files

  • REDIS_SESSION_PATCH.md (added, +255/-0)
  • RESPONSES_API_TEST_README.md (added, +100/-0)
  • litellm/responses/litellm_completion_transformation/handler.py (modified, +18/-0)
  • litellm/responses/litellm_completion_transformation/streaming_iterator.py (modified, +80/-7)
  • litellm/responses/litellm_completion_transformation/transformation.py (modified, +82/-0)
  • responses_api_config.yaml (added, +36/-0)
  • test_redis_session_patch.py (added, +92/-0)
  • test_responses_api.py (added, +229/-0)
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

This PR was closed, but it's exactly the same problem: https://github.com/BerriAI/litellm/pull/13339

For non OpenAI providers (which lacks of a native response API endpoint) you have to wait about 10 secs (the time that LiteLLM waits to add logs to S3) to be able to retrieve old messages and keep the conversations.

This makes unusable for chat apps, or some fast processing apps.

Steps to Reproduce

  1. Make a Responses request with for example a Gemini model
  2. Keep that response ID and before 10 secs try to keep the conversation
  3. Fail
  4. Repeat with OpenAI
  5. Works (because OpenAI store messages in her side)

Relevant log output

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.83.6

Twitter / LinkedIn details

No response

extent analysis

TL;DR

Implement a caching mechanism or modify the LiteLLM proxy to reduce the waiting time for non-OpenAI providers to retrieve old messages.

Guidance

  • Investigate the possibility of implementing a caching layer to store response IDs and their corresponding messages, allowing for faster retrieval.
  • Consider modifying the LiteLLM proxy to handle non-OpenAI providers differently, potentially by using a different API endpoint or reducing the waiting time.
  • Review the LiteLLM documentation to see if there are any configuration options or workarounds for non-OpenAI providers.
  • Test the implementation with different providers, including Gemini and OpenAI, to ensure compatibility.

Example

No code example is provided due to the lack of specific implementation details in the issue.

Notes

The solution may vary depending on the specific requirements of the chat apps or fast processing apps using LiteLLM. Additionally, the waiting time of 10 seconds may be a fixed value or configurable, which could impact the chosen solution.

Recommendation

Apply workaround: Implement a caching mechanism or modify the LiteLLM proxy to reduce the waiting time, as upgrading to a fixed version is not mentioned in the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING