litellm - ✅(Solved) Fix Redis Semantic Cache skips Responses API input, leaving RediSearch empty [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#28272Fetched 2026-05-20 03:40:21
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×1labeled ×1

Redis Semantic Cache currently appears to skip true semantic/vector caching for the Responses API (/v1/responses). Exact caching can work once responses / aresponses are in supported_call_types, but the Redis semantic backend extracts prompt text only from messages. The Responses API uses input, so semantic store/lookup returns before calling RedisVL store / check, and the RediSearch index remains empty.

This was observed with RediSearch-enabled Redis and Azure OpenAI embeddings. The local code-level repro below does not require a live Redis instance because it shows the early return before RedisVL is called.

Root Cause

This was observed with RediSearch-enabled Redis and Azure OpenAI embeddings. The local code-level repro below does not require a live Redis instance because it shows the early return before RedisVL is called.

Fix Action

Fix / Workaround

A focused patch would likely include:

PR fix notes

PR #28274: fix: support Responses input in Redis semantic cache

Description (problem / solution / changelog)

Relevant issues

Fixes #28272

Linear ticket

N/A

Pre-Submission checklist

  • I have added testing in the tests/test_litellm directory
  • My PR passes all unit tests on make test-unit
  • My PR scope is isolated and solves one specific problem
  • I have requested a Greptile review

Screenshots / Proof of Fix

Before this change, RedisSemanticCache only extracted prompts from messages. Responses API calls pass input instead, so RedisVL store/check were skipped and the RediSearch index stayed empty.

Validation run locally:

uv run pytest tests/test_litellm/caching/test_redis_semantic_cache.py -q
...................                                                      [100%]
19 passed in 0.27s

uv run ruff check litellm/caching/caching.py litellm/caching/redis_semantic_cache.py tests/test_litellm/caching/test_redis_semantic_cache.py
All checks passed!

uv run black --check litellm/caching/caching.py litellm/caching/redis_semantic_cache.py tests/test_litellm/caching/test_redis_semantic_cache.py
All done!
3 files would be left unchanged.

I also attempted the broader caching test directory, but the local environment is missing optional dependencies for unrelated Azure/S3 cache tests (azure, boto3).

Type

Bug Fix Test

Changes

  • Added RedisSemanticCache prompt extraction for Responses API input, while preserving existing chat messages extraction.
  • Supported string input and structured Responses input content lists.
  • Updated sync cache lookup to pass original kwargs through to backend caches so input and metadata are available to semantic caches.
  • Added regression tests for sync set/get, async set/get, structured Responses input extraction, and wrapper passthrough.

Changed files

  • litellm/caching/caching.py (modified, +2/-5)
  • litellm/caching/redis_semantic_cache.py (modified, +84/-21)
  • tests/test_litellm/caching/test_redis_semantic_cache.py (modified, +245/-0)
  • tests/test_litellm/interactions/test_openapi_compliance.py (modified, +1/-0)

Code Example

from unittest.mock import Mock
from litellm.caching.redis_semantic_cache import RedisSemanticCache

cache = RedisSemanticCache.__new__(RedisSemanticCache)
cache.llmcache = Mock()
cache._get_cache_filters = Mock(return_value={})
cache._get_ttl = Mock(return_value=None)

cache.set_cache("responses-key", {"ok": True}, input="hello responses")
print("store_calls", cache.llmcache.store.call_count)

cache.llmcache.check = Mock(return_value=[])
metadata = {}
result = cache.get_cache("responses-key", input="hello responses", metadata=metadata)
print("lookup_result", result)
print("check_calls", cache.llmcache.check.call_count)
print("metadata", metadata)

---

store_calls 0
lookup_result None
check_calls 0
metadata {'semantic-similarity': 0.0}

---

uv run pytest tests/test_litellm/caching/test_redis_semantic_cache.py -q
..............                                                           [100%]
14 passed in 0.20s
RAW_BUFFERClick to expand / collapse

Summary

Redis Semantic Cache currently appears to skip true semantic/vector caching for the Responses API (/v1/responses). Exact caching can work once responses / aresponses are in supported_call_types, but the Redis semantic backend extracts prompt text only from messages. The Responses API uses input, so semantic store/lookup returns before calling RedisVL store / check, and the RediSearch index remains empty.

This was observed with RediSearch-enabled Redis and Azure OpenAI embeddings. The local code-level repro below does not require a live Redis instance because it shows the early return before RedisVL is called.

Evidence from current upstream checkout

Checked on BerriAI/litellm default branch litellm_internal_staging at commit c2efe9e422b6ce62f0001d847d578d1e7d7ea6e3 on 2026-05-19.

Minimal repro

Run from the repo root:

from unittest.mock import Mock
from litellm.caching.redis_semantic_cache import RedisSemanticCache

cache = RedisSemanticCache.__new__(RedisSemanticCache)
cache.llmcache = Mock()
cache._get_cache_filters = Mock(return_value={})
cache._get_ttl = Mock(return_value=None)

cache.set_cache("responses-key", {"ok": True}, input="hello responses")
print("store_calls", cache.llmcache.store.call_count)

cache.llmcache.check = Mock(return_value=[])
metadata = {}
result = cache.get_cache("responses-key", input="hello responses", metadata=metadata)
print("lookup_result", result)
print("check_calls", cache.llmcache.check.call_count)
print("metadata", metadata)

Observed output:

store_calls 0
lookup_result None
check_calls 0
metadata {'semantic-similarity': 0.0}

This shows the Responses-style input payload never reaches RedisVL store/check. In a live RediSearch-enabled Redis deployment, that matches the observed empty semantic index.

Current test coverage

The existing Redis semantic cache unit tests pass, but they do not cover Responses input extraction:

uv run pytest tests/test_litellm/caching/test_redis_semantic_cache.py -q
..............                                                           [100%]
14 passed in 0.20s

The existing tests mock RedisVL and exercise message-based semantic caching, cache-key filtering, and semantic similarity metadata. They do not assert that /responses input is converted into a semantic prompt.

Expected behavior

For responses / aresponses calls using Redis Semantic Cache:

  1. The semantic cache should extract prompt text from Responses input.
  2. Store path should call RedisVL store/astore and populate the RediSearch index.
  3. Lookup path should call RedisVL check/acheck and return semantic hits when similarity threshold is met.
  4. Exact cache and semantic cache behavior should remain distinct and observable.

Suggested fix scope

A focused patch would likely include:

  • Add a shared prompt extraction helper for RedisSemanticCache that supports both:
    • chat-style messages
    • Responses-style input, including string input and structured ResponseInputParam list items
  • Update sync and async Redis semantic set/get paths to use that helper.
  • Update sync Cache.get_cache() to pass the original kwargs through to backend caches, or otherwise pass input / metadata along with messages.
  • Add unit tests in tests/test_litellm/caching/test_redis_semantic_cache.py for:
    • set_cache(..., input="...") calls RedisVL store
    • get_cache(..., input="...") calls RedisVL check
    • async async_set_cache / async_get_cache behavior for Responses input
    • structured Responses input list extraction

Observability and preflight gaps seen while investigating

Two related improvements would make this easier to operate:

Related issue search

I searched open issues for RedisSemanticCache responses semantic cache RediSearch input and redis semantic cache responses input and did not find an exact duplicate. The closest related open issue I found is #23441, but that is focused on Qdrant semantic cache failures rather than Redis Semantic Cache and Responses input extraction.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

For responses / aresponses calls using Redis Semantic Cache:

  1. The semantic cache should extract prompt text from Responses input.
  2. Store path should call RedisVL store/astore and populate the RediSearch index.
  3. Lookup path should call RedisVL check/acheck and return semantic hits when similarity threshold is met.
  4. Exact cache and semantic cache behavior should remain distinct and observable.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING