For `responses` / `aresponses` calls using Redis Semantic Cache: 1. The semantic cache should extract prompt text from Responses `input`. 2. Store path should call RedisVL store/astore and populate the RediSearch index. 3. Lookup path should call RedisVL check/acheck and return semantic hits when similarity threshold is met. 4. Exact cache and semantic cache behavior should remain distinct and observable.

litellm - ✅(Solved) Fix Redis Semantic Cache skips Responses API input, leaving RediSearch empty [1 pull requests, 1 participants]

balcsida · 2026-05-19T18:48:58Z

[litellm] Redis Semantic Cache currently appears to skip true semantic/vector caching for the Responses API /v1/responses . Exact caching can work once respons… Redis Semantic Cache currently appears to skip true semantic/vector caching for the Responses API (`/v1/responses`). Exact caching can work once `responses` / `aresponses` are in `supported_call_types`, but the Redis semantic backend extracts prompt text only from `messages`. The Responses API uses `input`, so semantic store/lookup returns before calling RedisVL `store` / `check`, and the RediSearch index remains empty. This was observed with RediSearch-enabled Redis and Azure OpenAI embeddings. The local code-level repro below does not require a live Redis instance because it shows the early return before RedisVL is called. # PR #28274: fix: support Responses input in Redis semantic cache - Repository: BerriAI/litellm - Author: balcsida - State: open | merged: False - Link: https://github.com/BerriAI/litellm/pull/28274 ## Description (problem / solution / changelog) ## Relevant issues Fixes #28272 ## Linear ticket N/A ## Pre-Submission checklist - [x] I have added testing in the tests/test_litellm directory - [ ] My PR passes all unit tests on make test-unit - [x] My PR scope is isolated and solves one specific problem - [ ] I have requested a Greptile review ## Screenshots / Proof of Fix Before this change, RedisSemanticCache only extracted prompts from messages. Responses API calls pass input instead, so RedisVL store/check were skipped and the RediSearch index stayed empty. Validation run locally: ```text uv run pytest tests/test_litellm/caching/test_redis_semantic_cache.py -q ................... [100%] 19 passed in 0.27s uv run ruff check litellm/caching/caching.py litellm/caching/redis_semantic_cache.py tests/test_litellm/caching/test_redis_semantic_cache.py All checks passed! uv run black --check litellm/caching/caching.py litellm/caching/redis_semantic_cache.py tests/test_litellm/caching/test_redis_semantic_cache.py All done! 3 files would be left unchanged. ``` I also attempted the broader caching test directory, but the local environment is missing optional dependencies for unrelated Azure/S3 cache tests (`azure`, `boto3`). ## Type Bug Fix Test ## Changes - Added RedisSemanticCache prompt extraction for Responses API input, while preserving existing chat messages extraction. - Supported string input and structured Responses input content lists. - Updated sync cache lookup to pass original kwargs through to backend caches so input and metadata are available to semantic caches. - Added regression tests for sync set/get, async set/get, structured Responses input extraction, and wrapper passthrough. ## Changed files - `litellm/caching/caching.py` (modified, +2/-5) - `litellm/caching/redis_semantic_cache.py` (modified, +84/-21) - `tests/test_litellm/caching/test_redis_semantic_cache.py` (modified, +245/-0) - `tests/test_litellm/interactions/test_openapi_compliance.py` (modified, +1/-0) ## Fix / Workaround A focused patch would likely include: ## Summary Redis Semantic Cache currently appears to skip true semantic/vector caching for the Responses API (`/v1/responses`). Exact caching can work once `responses` / `aresponses` are in `supported_call_types`, but the Redis semantic backend extracts prompt text only from `messages`. The Responses API uses `input`, so semantic store/lookup returns before calling RedisVL `store` / `check`, and the RediSearch index remains empty. This was observed with RediSearch-enabled Redis and Azure OpenAI embeddings. The local code-level repro below does not require a live Redis instance because it shows the early return before RedisVL is called. ## Evidence from current upstream checkout Checked on `BerriAI/litellm` default branch `litellm_internal_staging` at commit `c2efe9e422b6ce62f0001d847d578d1e7d7ea6e3` on 2026-05-19. - `Cache.__init__` default `supported_call_types` already includes `responses` and `aresponses`: https://github.com/BerriAI/litellm/blob/c2efe9e422b6ce62f0001d847d578d1e7d7ea6e3/litellm/caching/caching.py#L70-L83 - `responses` and `aresponses` take `input`, not `messages`: https://github.com/BerriAI/litellm/blob/c2efe9e422b6ce62f0001d847d578d1e7d7ea6e3/litellm/responses/main.py#L416-L418 https://github.com/BerriAI/litellm/blob/c2efe9e422b6ce62f0001d847d578d1e7d7ea6e3/litellm/responses/main.py#L901-L903 - `RedisSemanticCache.set_cache()` only reads `kwargs["messages"]` and returns when it is absent, before `self.llmcache.store(...)`: https://github.com/BerriAI/litellm/blob/c2efe9e422b6ce62f0001d847d578d1e7d7ea6e3/litellm/caching/redis_semantic_cache.py#L281-L298 - `RedisSemanticCache.get_cache()` also only reads `kwargs["messages"]` and returns `None` when it is absent, before `self.llmcache.check(...)`: https://github.com/BerriAI/litellm/blob/c2efe9e422b6ce62f0001d847d578d1e7d7ea6e3/litellm/caching/redis_semantic_cache.py#L317-L332 - The async Redis semantic paths have the same `messages`-only ex

litellm2026-05-19 18:48:58

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#28272•Fetched 2026-05-20 03:40:21

View on GitHub

Comments

Participants

Timeline

Reactions

Author

balcsida

Participants

balcsida

Timeline (top)

cross-referenced ×1labeled ×1

Redis Semantic Cache currently appears to skip true semantic/vector caching for the Responses API (/v1/responses). Exact caching can work once responses / aresponses are in supported_call_types, but the Redis semantic backend extracts prompt text only from messages. The Responses API uses input, so semantic store/lookup returns before calling RedisVL store / check, and the RediSearch index remains empty.

This was observed with RediSearch-enabled Redis and Azure OpenAI embeddings. The local code-level repro below does not require a live Redis instance because it shows the early return before RedisVL is called.

Root Cause

Fix Action

Fix / Workaround

A focused patch would likely include:

PR fix notes

PR #28274: fix: support Responses input in Redis semantic cache

Repository: BerriAI/litellm
Author: balcsida
State: open | merged: False
Link: https://github.com/BerriAI/litellm/pull/28274

Description (problem / solution / changelog)

Relevant issues

Fixes #28272

Linear ticket

N/A

Pre-Submission checklist

I have added testing in the tests/test_litellm directory
My PR passes all unit tests on make test-unit
My PR scope is isolated and solves one specific problem
I have requested a Greptile review

Screenshots / Proof of Fix

Before this change, RedisSemanticCache only extracted prompts from messages. Responses API calls pass input instead, so RedisVL store/check were skipped and the RediSearch index stayed empty.

Validation run locally:

uv run pytest tests/test_litellm/caching/test_redis_semantic_cache.py -q
...................                                                      [100%]
19 passed in 0.27s

uv run ruff check litellm/caching/caching.py litellm/caching/redis_semantic_cache.py tests/test_litellm/caching/test_redis_semantic_cache.py
All checks passed!

uv run black --check litellm/caching/caching.py litellm/caching/redis_semantic_cache.py tests/test_litellm/caching/test_redis_semantic_cache.py
All done!
3 files would be left unchanged.

I also attempted the broader caching test directory, but the local environment is missing optional dependencies for unrelated Azure/S3 cache tests (azure, boto3).

Type

Bug Fix Test

Changes

Added RedisSemanticCache prompt extraction for Responses API input, while preserving existing chat messages extraction.
Supported string input and structured Responses input content lists.
Updated sync cache lookup to pass original kwargs through to backend caches so input and metadata are available to semantic caches.
Added regression tests for sync set/get, async set/get, structured Responses input extraction, and wrapper passthrough.

Changed files

litellm/caching/caching.py (modified, +2/-5)
litellm/caching/redis_semantic_cache.py (modified, +84/-21)
tests/test_litellm/caching/test_redis_semantic_cache.py (modified, +245/-0)
tests/test_litellm/interactions/test_openapi_compliance.py (modified, +1/-0)

Code Example

from unittest.mock import Mock
from litellm.caching.redis_semantic_cache import RedisSemanticCache

cache = RedisSemanticCache.__new__(RedisSemanticCache)
cache.llmcache = Mock()
cache._get_cache_filters = Mock(return_value={})
cache._get_ttl = Mock(return_value=None)

cache.set_cache("responses-key", {"ok": True}, input="hello responses")
print("store_calls", cache.llmcache.store.call_count)

cache.llmcache.check = Mock(return_value=[])
metadata = {}
result = cache.get_cache("responses-key", input="hello responses", metadata=metadata)
print("lookup_result", result)
print("check_calls", cache.llmcache.check.call_count)
print("metadata", metadata)

---

store_calls 0
lookup_result None
check_calls 0
metadata {'semantic-similarity': 0.0}

---

uv run pytest tests/test_litellm/caching/test_redis_semantic_cache.py -q
..............                                                           [100%]
14 passed in 0.20s

RAW_BUFFERClick to expand / collapse

Summary

Evidence from current upstream checkout

Checked on BerriAI/litellm default branch litellm_internal_staging at commit c2efe9e422b6ce62f0001d847d578d1e7d7ea6e3 on 2026-05-19.

Cache.__init__ default supported_call_types already includes responses and aresponses: https://github.com/BerriAI/litellm/blob/c2efe9e422b6ce62f0001d847d578d1e7d7ea6e3/litellm/caching/caching.py#L70-L83
responses and aresponses take input, not messages: https://github.com/BerriAI/litellm/blob/c2efe9e422b6ce62f0001d847d578d1e7d7ea6e3/litellm/responses/main.py#L416-L418 https://github.com/BerriAI/litellm/blob/c2efe9e422b6ce62f0001d847d578d1e7d7ea6e3/litellm/responses/main.py#L901-L903
RedisSemanticCache.set_cache() only reads kwargs["messages"] and returns when it is absent, before self.llmcache.store(...): https://github.com/BerriAI/litellm/blob/c2efe9e422b6ce62f0001d847d578d1e7d7ea6e3/litellm/caching/redis_semantic_cache.py#L281-L298
RedisSemanticCache.get_cache() also only reads kwargs["messages"] and returns None when it is absent, before self.llmcache.check(...): https://github.com/BerriAI/litellm/blob/c2efe9e422b6ce62f0001d847d578d1e7d7ea6e3/litellm/caching/redis_semantic_cache.py#L317-L332
The async Redis semantic paths have the same messages-only extraction: https://github.com/BerriAI/litellm/blob/c2efe9e422b6ce62f0001d847d578d1e7d7ea6e3/litellm/caching/redis_semantic_cache.py#L430-L456 https://github.com/BerriAI/litellm/blob/c2efe9e422b6ce62f0001d847d578d1e7d7ea6e3/litellm/caching/redis_semantic_cache.py#L473-L493
The sync wrapper lookup path also drops most original kwargs and passes only messages=messages to the backend cache, so even if RedisSemanticCache.get_cache() learned input, sync lookup would still need passthrough: https://github.com/BerriAI/litellm/blob/c2efe9e422b6ce62f0001d847d578d1e7d7ea6e3/litellm/caching/caching.py#L526-L531
Async wrapper lookup already passes **kwargs, but RedisSemanticCache still ignores input: https://github.com/BerriAI/litellm/blob/c2efe9e422b6ce62f0001d847d578d1e7d7ea6e3/litellm/caching/caching.py#L562-L569

Minimal repro

Run from the repo root:

from unittest.mock import Mock
from litellm.caching.redis_semantic_cache import RedisSemanticCache

cache = RedisSemanticCache.__new__(RedisSemanticCache)
cache.llmcache = Mock()
cache._get_cache_filters = Mock(return_value={})
cache._get_ttl = Mock(return_value=None)

cache.set_cache("responses-key", {"ok": True}, input="hello responses")
print("store_calls", cache.llmcache.store.call_count)

cache.llmcache.check = Mock(return_value=[])
metadata = {}
result = cache.get_cache("responses-key", input="hello responses", metadata=metadata)
print("lookup_result", result)
print("check_calls", cache.llmcache.check.call_count)
print("metadata", metadata)

Observed output:

store_calls 0
lookup_result None
check_calls 0
metadata {'semantic-similarity': 0.0}

This shows the Responses-style input payload never reaches RedisVL store/check. In a live RediSearch-enabled Redis deployment, that matches the observed empty semantic index.

Current test coverage

The existing Redis semantic cache unit tests pass, but they do not cover Responses input extraction:

uv run pytest tests/test_litellm/caching/test_redis_semantic_cache.py -q
..............                                                           [100%]
14 passed in 0.20s

The existing tests mock RedisVL and exercise message-based semantic caching, cache-key filtering, and semantic similarity metadata. They do not assert that /responses input is converted into a semantic prompt.

Expected behavior

For responses / aresponses calls using Redis Semantic Cache:

The semantic cache should extract prompt text from Responses input.
Store path should call RedisVL store/astore and populate the RediSearch index.
Lookup path should call RedisVL check/acheck and return semantic hits when similarity threshold is met.
Exact cache and semantic cache behavior should remain distinct and observable.

Suggested fix scope

A focused patch would likely include:

Add a shared prompt extraction helper for RedisSemanticCache that supports both:
- chat-style messages
- Responses-style input, including string input and structured ResponseInputParam list items
Update sync and async Redis semantic set/get paths to use that helper.
Update sync Cache.get_cache() to pass the original kwargs through to backend caches, or otherwise pass input / metadata along with messages.
Add unit tests in tests/test_litellm/caching/test_redis_semantic_cache.py for:
- set_cache(..., input="...") calls RedisVL store
- get_cache(..., input="...") calls RedisVL check
- async async_set_cache / async_get_cache behavior for Responses input
- structured Responses input list extraction

Observability and preflight gaps seen while investigating

Two related improvements would make this easier to operate:

Observability: today the proxy exposes generic cache hit headers and x-litellm-semantic-similarity only when metadata contains semantic-similarity: https://github.com/BerriAI/litellm/blob/c2efe9e422b6ce62f0001d847d578d1e7d7ea6e3/litellm/proxy/common_utils/callback_utils.py#L419-L420

It would be useful to distinguish exact cache hits from semantic cache hits explicitly in headers/log metadata, especially when both exact and semantic cache paths are configured.
Redis Semantic Cache preflight: readiness can report Redis semantic index info through _index_info(): https://github.com/BerriAI/litellm/blob/c2efe9e422b6ce62f0001d847d578d1e7d7ea6e3/litellm/proxy/health_endpoints/_health_endpoints.py#L1497-L1505

But the Helm chart path does not appear to provide a clear preflight guard that the configured Redis actually has RediSearch/Redis Stack module support. A safer preflight / Helm value check would prevent deploying redis-semantic against a plain Redis instance where vector indexing cannot work.

Related issue search

I searched open issues for RedisSemanticCache responses semantic cache RediSearch input and redis semantic cache responses input and did not find an exact duplicate. The closest related open issue I found is #23441, but that is focused on Qdrant semantic cache failures rather than Redis Semantic Cache and Responses input extraction.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

For responses / aresponses calls using Redis Semantic Cache:

The semantic cache should extract prompt text from Responses input.
Store path should call RedisVL store/astore and populate the RediSearch index.
Lookup path should call RedisVL check/acheck and return semantic hits when similarity threshold is met.
Exact cache and semantic cache behavior should remain distinct and observable.

#api #serialization error #model compatibility #GPU setup #container setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - ✅(Solved) Fix Redis Semantic Cache skips Responses API input, leaving RediSearch empty [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #28274: fix: support Responses input in Redis semantic cache

Description (problem / solution / changelog)

Relevant issues

Linear ticket

Pre-Submission checklist

Screenshots / Proof of Fix

Type

Changes

Changed files

Code Example

Summary

Evidence from current upstream checkout

Minimal repro

Current test coverage

Expected behavior

Suggested fix scope

Observability and preflight gaps seen while investigating

Related issue search

FAQ

Expected behavior

Still need to ship something?

TRENDING

litellm - ✅(Solved) Fix Redis Semantic Cache skips Responses API input, leaving RediSearch empty [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #28274: fix: support Responses input in Redis semantic cache

Description (problem / solution / changelog)

Relevant issues

Linear ticket

Pre-Submission checklist

Screenshots / Proof of Fix

Type

Changes

Changed files

Code Example

Summary

Evidence from current upstream checkout

Minimal repro

Current test coverage

Expected behavior

Suggested fix scope

Observability and preflight gaps seen while investigating

Related issue search

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING