litellm - ✅(Solved) Fix [Bug]: Session continuation only works after 10 secs on non OpenAI providers [1 pull requests, 1 participants]

litellm2026-04-07 13:05:23

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#25289•Fetched 2026-04-08 03:02:06

View on GitHub

Comments

Participants

Timeline

Reactions

Author

rodriciru

Participants

rodriciru

Timeline (top)

labeled ×3

Root Cause

Make a Responses request with for example a Gemini model
Keep that response ID and before 10 secs try to keep the conversation
Fail
Repeat with OpenAI
Works (because OpenAI store messages in her side)

PR fix notes

PR #13339: Fix: Responses API Redis session timing for conversation context

Repository: BerriAI/litellm
Author: jatorre
State: closed | merged: False
Link: https://github.com/BerriAI/litellm/pull/13339

Description (problem / solution / changelog)

Fix: Responses API Redis session timing for conversation context

Description

This PR fixes a critical timing issue in the Responses API where conversation context was not being maintained between requests when using Redis cache. The issue was caused by batch processing delays in storing session data.

Problem

When using the Responses API with previous_response_id, the conversation context was not available immediately for subsequent requests. This was particularly problematic for:

Multi-turn conversations requiring context
High-throughput applications needing immediate session availability
ChatGPT-like interfaces requiring seamless conversation flow

Solution

Added a Redis-first session storage mechanism that:

Stores session data immediately after response generation
Retrieves session data directly from Redis on subsequent requests
Falls back gracefully to existing enterprise/database logic if Redis is unavailable

Changes

Added _patch_store_session_in_redis() method for immediate session storage
Added _patch_get_session_from_redis() method for fast session retrieval
Modified pre_completion_batch_processing() to check Redis first
All changes are backward compatible and fail gracefully

Testing

Tested with multiple providers:

✅ Anthropic Claude 3.5 Sonnet
✅ DeepSeek Chat
✅ Google Gemini 2.0 Flash

Test Script

# First request
response1 = requests.post("/v1/responses", json={
    "model": "claude-3-5-sonnet",
    "input": "My name is Alice",
    "max_tokens": 100
})
response_id = response1.json()["id"]

# Second request with context
response2 = requests.post("/v1/responses", json={
    "model": "claude-3-5-sonnet",
    "input": "What's my name?",
    "previous_response_id": response_id,
    "max_tokens": 100
})
# Now correctly returns "Alice" instead of "I don't know"

Configuration Required

litellm_settings:
  cache: true  # Required
  cache_params:
    type: "redis"
    host: "localhost"
    port: 6379

Related Issues

Fixes #12364 - Responses API conversation context timing issue
Related to #12640 - Native OpenAI Responses API support

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature
Breaking change
Documentation update

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my code
I have tested with multiple providers
Changes are backward compatible
Fails gracefully when Redis is not available

Additional Notes

This is implemented as a patch with clear comments indicating it's a temporary fix. The upstream team may want to integrate this more deeply into the core session handling logic. The patch:

Only activates when Redis cache is configured
Does not interfere with existing enterprise session handlers
Adds minimal overhead (single Redis get/set operation)
Uses 24-hour TTL for session data

Changed files

REDIS_SESSION_PATCH.md (added, +255/-0)
RESPONSES_API_TEST_README.md (added, +100/-0)
litellm/responses/litellm_completion_transformation/handler.py (modified, +18/-0)
litellm/responses/litellm_completion_transformation/streaming_iterator.py (modified, +80/-7)
litellm/responses/litellm_completion_transformation/transformation.py (modified, +82/-0)
responses_api_config.yaml (added, +36/-0)
test_redis_session_patch.py (added, +92/-0)
test_responses_api.py (added, +229/-0)

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

This PR was closed, but it's exactly the same problem: https://github.com/BerriAI/litellm/pull/13339

For non OpenAI providers (which lacks of a native response API endpoint) you have to wait about 10 secs (the time that LiteLLM waits to add logs to S3) to be able to retrieve old messages and keep the conversations.

This makes unusable for chat apps, or some fast processing apps.

Steps to Reproduce

Make a Responses request with for example a Gemini model
Keep that response ID and before 10 secs try to keep the conversation
Fail
Repeat with OpenAI
Works (because OpenAI store messages in her side)

Relevant log output

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.83.6

Twitter / LinkedIn details

No response

extent analysis

TL;DR

Implement a caching mechanism or modify the LiteLLM proxy to reduce the waiting time for non-OpenAI providers to retrieve old messages.

Guidance

Investigate the possibility of implementing a caching layer to store response IDs and their corresponding messages, allowing for faster retrieval.
Consider modifying the LiteLLM proxy to handle non-OpenAI providers differently, potentially by using a different API endpoint or reducing the waiting time.
Review the LiteLLM documentation to see if there are any configuration options or workarounds for non-OpenAI providers.
Test the implementation with different providers, including Gemini and OpenAI, to ensure compatibility.

Example

No code example is provided due to the lack of specific implementation details in the issue.

Notes

The solution may vary depending on the specific requirements of the chat apps or fast processing apps using LiteLLM. Additionally, the waiting time of 10 seconds may be a fixed value or configurable, which could impact the chosen solution.

Recommendation

Apply workaround: Implement a caching mechanism or modify the LiteLLM proxy to reduce the waiting time, as upgrading to a fixed version is not mentioned in the issue.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #authentication setup #request error #file not found #serialization error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.