litellm - ✅(Solved) Fix AttributeError: 'Cache' object has no attribute 'cache' — Qdrant semantic cache completely non-functional due to multiple bugs [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#23441Fetched 2026-04-08 00:36:48
View on GitHub
Comments
1
Participants
2
Timeline
2
Reactions
2
Author
Participants
Timeline (top)
commented ×1labeled ×1

When configuring LiteLLM proxy with type: qdrant-semantic cache, the proxy crashes on startup and silently fails on every subsequent request due to AttributeError: 'Cache' object has no attribute 'cache' in multiple source files. This is because the code assumes all cache backends expose an inner .cache attribute (which Redis-based backends do), but QdrantSemanticCache does not — it is the cache object itself, stored as self.cache on the outer Cache wrapper.

There are also related configuration and embedding routing issues that make Qdrant semantic cache impossible to configure correctly without reading source code.

These bugs are confirmed present in the current main branch as of March 2026. Issue #19163 (filed January 2026) independently confirms the litellm.cache.cache pattern is still in proxy_server.py at ~line 1987, and issue #14889 (filed September 2025) confirms the embedding provider routing failure is a known open issue.


Error Message

AttributeError: 'Cache' object has no attribute 'cache'

File ".../litellm/proxy/proxy_server.py", line 2458, in _init_cache litellm.cache.cache, (RedisCache, RedisClusterCache) ^^^^^^^^^^^^^^^^^^^

Root Cause

Cache.__init__ sets self.cache = QdrantSemanticCache(...) for the qdrant-semantic type (line 204 of caching.py). However, multiple code paths outside caching.py access litellm.cache.cache directly, assuming it always exists. For Redis this works because Cache.__init__ always sets self.cache. For Qdrant, if init fails at any point due to Bug 1 crashing before line 204 is reached, self.cache is never set, causing cascading failures across all downstream cache paths.


Fix Action

Fix / Workaround

  1. Proxy crashes on startup with AttributeError: 'Cache' object has no attribute 'cache'
  2. After patching the startup crash, every chat completion request fails with the same error at request time
  3. After patching that, cache reads silently return None with caching=False logged on every request
  4. After fixing the caching=False issue, cache writes fail with LLM Provider NOT provided because the embedding model name is passed without its provider prefix to litellm.aembedding()

Workaround / Patch Applied

The following script patches all three AttributeError locations in place:

PR fix notes

PR #25556: fix(caching): fix AttributeError crashes and embedding fallback for Qdrant semantic cache

Description (problem / solution / changelog)

Summary

Qdrant semantic cache is completely non-functional due to 4 cascading bugs. Multiple code paths access litellm.cache.cache directly, which only exists for Redis-based backends. For Qdrant, this raises AttributeError: 'Cache' object has no attribute 'cache' at startup and on every request.

Changes

  1. proxy_server.py: Use getattr(litellm.cache, "cache", None) instead of litellm.cache.cache when checking for Redis usage cache
  2. caching_handler.py: Same fix for RedisCache and S3Cache isinstance checks
  3. caching.py + qdrant_semantic_cache.py: Add embed_api_base parameter so the fallback litellm.aembedding() call can pass api_base for non-OpenAI embedding models (prevents "LLM Provider NOT provided" error)

Tests

3 new tests in tests/test_litellm/caching/test_qdrant_semantic_cache.py:

  • test_proxy_init_cache_does_not_crash_on_non_redis_cache
  • test_caching_handler_does_not_crash_on_non_redis_cache
  • test_qdrant_semantic_cache_embed_api_base

All 3 pass. The 3 pre-existing async test failures (test_qdrant_semantic_cache_async_*) are unrelated and fail on main as well due to missing proxy dependencies in the unit test environment.

Closes #23441 Related: #19163, #14889

Changed files

  • litellm/caching/caching.py (modified, +2/-0)
  • litellm/caching/caching_handler.py (modified, +5/-4)
  • litellm/caching/qdrant_semantic_cache.py (modified, +36/-22)
  • litellm/proxy/proxy_server.py (modified, +2/-2)
  • tests/test_litellm/caching/test_qdrant_semantic_cache.py (modified, +74/-0)

Code Example

litellm_settings:
  cache: true
  cache_params:
    type: qdrant-semantic
    qdrant_api_base: "http://<qdrant-host>:6333"
    qdrant_api_key: "<api-key>"
    qdrant_collection_name: "litellm-semantic-cache"
    qdrant_semantic_cache_embedding_model: "nomic-embed-text"
    qdrant_semantic_cache_vector_size: 768
    similarity_threshold: 0.85

model_list:
  - model_name: nomic-embed-text
    litellm_params:
      model: ollama/nomic-embed-text
      api_base: http://<ollama-host>:11434

---

litellm --config config.yaml

---

AttributeError: 'Cache' object has no attribute 'cache'

  File ".../litellm/proxy/proxy_server.py", line 2458, in _init_cache
    litellm.cache.cache, (RedisCache, RedisClusterCache)
    ^^^^^^^^^^^^^^^^^^^

---

# BROKEN — crashes when cache type is qdrant-semantic
if litellm.cache is not None and isinstance(
    litellm.cache.cache, (RedisCache, RedisClusterCache)
):
    redis_usage_cache = litellm.cache.cache

# FIX
if litellm.cache is not None and isinstance(
    getattr(litellm.cache, "cache", None), (RedisCache, RedisClusterCache)
):
    redis_usage_cache = getattr(litellm.cache, "cache", None)

---

AttributeError: 'Cache' object has no attribute 'cache'

  File ".../litellm/caching/caching_handler.py", line 103, in __init__
    if litellm.cache is not None and isinstance(litellm.cache.cache, RedisCache):
                                                ^^^^^^^^^^^^^^^^^^^

---

# BROKEN
if litellm.cache is not None and isinstance(litellm.cache.cache, RedisCache):
    redis_cache=litellm.cache.cache,

# FIX
if litellm.cache is not None and isinstance(getattr(litellm.cache, "cache", None), RedisCache):
    redis_cache=getattr(litellm.cache, "cache", None),

---

LiteLLM Cache: Exception add_cache: 'Cache' object has no attribute 'cache'

  File ".../litellm/caching/caching.py", line 637, in async_add_cache
    await self.cache.async_set_cache(cache_key, cached_data, **kwargs)
          ^^^^^^^^^^

---

# BROKEN — self.cache not set if __init__ failed mid-way
await self.cache.async_set_cache(cache_key, cached_data, **kwargs)

# FIX — same getattr pattern to safely handle missing attribute
inner = getattr(self, "cache", None)
if inner is not None:
    await inner.async_set_cache(cache_key, cached_data, **kwargs)

---

else:
    # convert to embedding
    embedding_response = await litellm.aembedding(
        model=self.embedding_model,   # e.g. "nomic-embed-text" — no provider prefix
        input=prompt,
        cache={"no-store": True, "no-cache": True},
        # no api_base passed!
    )

---

litellm.BadRequestError: LLM Provider NOT provided.
You passed model=nomic-embed-text
Pass model as e.g. completion(model='huggingface/starcoder', ...)

---

else:
    embedding_response = await litellm.aembedding(
        model=self.embedding_model,
        input=prompt,
        cache={"no-store": True, "no-cache": True},
        api_base=getattr(self, "embed_api_base", None),  # add this
    )

---

import re

patches = [
    (
        "/path/to/litellm/proxy/proxy_server.py",
        r'\blitellm\.cache\.cache\b',
        'getattr(litellm.cache, "cache", None)'
    ),
    (
        "/path/to/litellm/caching/caching_handler.py",
        r'\blitellm\.cache\.cache\b',
        'getattr(litellm.cache, "cache", None)'
    ),
    (
        "/path/to/litellm/caching/caching.py",
        r'\bself\.cache\.cache\b',
        'getattr(self.cache, "cache", None)'
    ),
]

for path, pattern, replacement in patches:
    with open(path, 'r') as f:
        content = f.read()
    new_content = re.sub(pattern, replacement, content)
    if new_content != content:
        with open(path, 'w') as f:
            f.write(new_content)
        print(f"✅ Patched: {path}")
    else:
        print(f"⚠️  No changes needed: {path}")

---

litellm_settings:
  drop_params: true
  cache: true
  cache_params:
    type: qdrant-semantic
    qdrant_api_base: "http://<qdrant-host>:6333"
    qdrant_api_key: "<api-key>"
    qdrant_collection_name: "litellm-semantic-cache"
    qdrant_semantic_cache_embedding_model: "nomic-embed-text"
    qdrant_semantic_cache_vector_size: 768
    similarity_threshold: 0.85

model_list:
  - model_name: nomic-embed-text
    litellm_params:
      model: ollama/nomic-embed-text
      api_base: http://<ollama-host>:11434
RAW_BUFFERClick to expand / collapse

Bug Report: AttributeError: 'Cache' object has no attribute 'cache' — Qdrant Semantic Cache Completely Non-Functional

Summary

When configuring LiteLLM proxy with type: qdrant-semantic cache, the proxy crashes on startup and silently fails on every subsequent request due to AttributeError: 'Cache' object has no attribute 'cache' in multiple source files. This is because the code assumes all cache backends expose an inner .cache attribute (which Redis-based backends do), but QdrantSemanticCache does not — it is the cache object itself, stored as self.cache on the outer Cache wrapper.

There are also related configuration and embedding routing issues that make Qdrant semantic cache impossible to configure correctly without reading source code.

These bugs are confirmed present in the current main branch as of March 2026. Issue #19163 (filed January 2026) independently confirms the litellm.cache.cache pattern is still in proxy_server.py at ~line 1987, and issue #14889 (filed September 2025) confirms the embedding provider routing failure is a known open issue.


Environment

FieldValue
LiteLLM Version1.81.16 (bugs confirmed present in main branch)
Python Version3.13.12
OSOperating System: Debian GNU/Linux 13 (trixie)
Install methoduv venv at /opt/litellm
QdrantSelf-hosted Qdrant v1.17.0
Embedding modelSelf-hosted nomic-embed-text via Ollama
Chat modelnvidia/nemotron-3-nano-30b-a3b via OpenRouter

To Reproduce

Configure LiteLLM proxy config.yaml:

litellm_settings:
  cache: true
  cache_params:
    type: qdrant-semantic
    qdrant_api_base: "http://<qdrant-host>:6333"
    qdrant_api_key: "<api-key>"
    qdrant_collection_name: "litellm-semantic-cache"
    qdrant_semantic_cache_embedding_model: "nomic-embed-text"
    qdrant_semantic_cache_vector_size: 768
    similarity_threshold: 0.85

model_list:
  - model_name: nomic-embed-text
    litellm_params:
      model: ollama/nomic-embed-text
      api_base: http://<ollama-host>:11434

Start the proxy:

litellm --config config.yaml

Observed failure sequence:

  1. Proxy crashes on startup with AttributeError: 'Cache' object has no attribute 'cache'
  2. After patching the startup crash, every chat completion request fails with the same error at request time
  3. After patching that, cache reads silently return None with caching=False logged on every request
  4. After fixing the caching=False issue, cache writes fail with LLM Provider NOT provided because the embedding model name is passed without its provider prefix to litellm.aembedding()

Expected Behavior

Proxy starts successfully, chat completions are written to Qdrant on first call, and subsequent semantically similar calls are served from cache with x-litellm-cache-key response header and sub-100ms response times (vs ~1600ms without cache).


Actual Behavior

Four separate bugs prevent Qdrant semantic cache from working at all, each silently blocking the next.


Bug Details

Bug 1 — Startup Crash

File: litellm/proxy/proxy_server.py ~line 1987 Confirmed in main branch via issue #19163

AttributeError: 'Cache' object has no attribute 'cache'

  File ".../litellm/proxy/proxy_server.py", line 2458, in _init_cache
    litellm.cache.cache, (RedisCache, RedisClusterCache)
    ^^^^^^^^^^^^^^^^^^^
# BROKEN — crashes when cache type is qdrant-semantic
if litellm.cache is not None and isinstance(
    litellm.cache.cache, (RedisCache, RedisClusterCache)
):
    redis_usage_cache = litellm.cache.cache

# FIX
if litellm.cache is not None and isinstance(
    getattr(litellm.cache, "cache", None), (RedisCache, RedisClusterCache)
):
    redis_usage_cache = getattr(litellm.cache, "cache", None)

Bug 2 — Request-Time Crash

File: litellm/caching/caching_handler.py ~line 103

AttributeError: 'Cache' object has no attribute 'cache'

  File ".../litellm/caching/caching_handler.py", line 103, in __init__
    if litellm.cache is not None and isinstance(litellm.cache.cache, RedisCache):
                                                ^^^^^^^^^^^^^^^^^^^
# BROKEN
if litellm.cache is not None and isinstance(litellm.cache.cache, RedisCache):
    redis_cache=litellm.cache.cache,

# FIX
if litellm.cache is not None and isinstance(getattr(litellm.cache, "cache", None), RedisCache):
    redis_cache=getattr(litellm.cache, "cache", None),

Bug 3 — Silent Cache Skip on Every Request

File: litellm/caching/caching.py ~line 637

LiteLLM Cache: Exception add_cache: 'Cache' object has no attribute 'cache'

  File ".../litellm/caching/caching.py", line 637, in async_add_cache
    await self.cache.async_set_cache(cache_key, cached_data, **kwargs)
          ^^^^^^^^^^

The async_add_cache method calls self.cache.async_set_cache(...). For Qdrant, Cache.__init__ sets self.cache = QdrantSemanticCache(...) at line 204 — but if this was never reached due to Bug 1 crashing earlier, self.cache is never set. The error is caught and swallowed internally, so the proxy continues serving requests while silently never writing to cache.

# BROKEN — self.cache not set if __init__ failed mid-way
await self.cache.async_set_cache(cache_key, cached_data, **kwargs)

# FIX — same getattr pattern to safely handle missing attribute
inner = getattr(self, "cache", None)
if inner is not None:
    await inner.async_set_cache(cache_key, cached_data, **kwargs)

Bug 4 — Embedding Provider Not Resolved for Cache Vectorization

File: litellm/caching/qdrant_semantic_cache.py ~line 315 Related open issue: #14889

When the proxy's llm_router is None or the embedding model name does not match router_model_names, the code falls through to a bare litellm.aembedding() call without an api_base:

else:
    # convert to embedding
    embedding_response = await litellm.aembedding(
        model=self.embedding_model,   # e.g. "nomic-embed-text" — no provider prefix
        input=prompt,
        cache={"no-store": True, "no-cache": True},
        # no api_base passed!
    )

This raises:

litellm.BadRequestError: LLM Provider NOT provided.
You passed model=nomic-embed-text
Pass model as e.g. completion(model='huggingface/starcoder', ...)

The router path at line 298 is the correct path and works when llm_router is initialized and the model name exactly matches a model_name in the model_list. The fallback else branch has no api_base and will always fail for non-OpenAI embedding models.

Fix: The else branch should pass api_base when available:

else:
    embedding_response = await litellm.aembedding(
        model=self.embedding_model,
        input=prompt,
        cache={"no-store": True, "no-cache": True},
        api_base=getattr(self, "embed_api_base", None),  # add this
    )

Root Cause

Cache.__init__ sets self.cache = QdrantSemanticCache(...) for the qdrant-semantic type (line 204 of caching.py). However, multiple code paths outside caching.py access litellm.cache.cache directly, assuming it always exists. For Redis this works because Cache.__init__ always sets self.cache. For Qdrant, if init fails at any point due to Bug 1 crashing before line 204 is reached, self.cache is never set, causing cascading failures across all downstream cache paths.


Workaround / Patch Applied

The following script patches all three AttributeError locations in place:

import re

patches = [
    (
        "/path/to/litellm/proxy/proxy_server.py",
        r'\blitellm\.cache\.cache\b',
        'getattr(litellm.cache, "cache", None)'
    ),
    (
        "/path/to/litellm/caching/caching_handler.py",
        r'\blitellm\.cache\.cache\b',
        'getattr(litellm.cache, "cache", None)'
    ),
    (
        "/path/to/litellm/caching/caching.py",
        r'\bself\.cache\.cache\b',
        'getattr(self.cache, "cache", None)'
    ),
]

for path, pattern, replacement in patches:
    with open(path, 'r') as f:
        content = f.read()
    new_content = re.sub(pattern, replacement, content)
    if new_content != content:
        with open(path, 'w') as f:
            f.write(new_content)
        print(f"✅ Patched: {path}")
    else:
        print(f"⚠️  No changes needed: {path}")

Working config after all patches applied:

litellm_settings:
  drop_params: true
  cache: true
  cache_params:
    type: qdrant-semantic
    qdrant_api_base: "http://<qdrant-host>:6333"
    qdrant_api_key: "<api-key>"
    qdrant_collection_name: "litellm-semantic-cache"
    qdrant_semantic_cache_embedding_model: "nomic-embed-text"
    qdrant_semantic_cache_vector_size: 768
    similarity_threshold: 0.85

model_list:
  - model_name: nomic-embed-text
    litellm_params:
      model: ollama/nomic-embed-text
      api_base: http://<ollama-host>:11434

Configuration Pitfalls / Documentation Gaps

The following parameter names are undocumented or incorrectly named in docs/examples, discoverable only by reading source code:

Wrong / UndocumentedCorrect
type: qdranttype: qdrant-semantic
qdrant_urlqdrant_api_base
qdrant_vector_sizeqdrant_semantic_cache_vector_size
cache: true at top-level of configMust be nested under litellm_settings to set caching=True on router requests — top-level placement results in caching=False logged on every request

The last point is particularly subtle: placing cache: true at the top level of config.yaml is shown in some documentation examples and appears to be parsed correctly (confirmed via yaml.safe_load), but it does not trigger cache_responses=True on the router. This causes every request to log ASYNC kwargs[caching]: False and skip the cache entirely regardless of what litellm.cache is set to.


Performance Results After Fix

CallPromptResultDuration
1"What is the capital of France?"MISS — sent to LLM1658ms
2"What is the capital of France?"HIT — served from Qdrant48ms
3"What is the capital of France?"HIT — served from Qdrant74ms
4"Which city is the capital of France?"Semantic HIT46ms

Qdrant collection auto-created and populated with vectors on first call. ✅


Related Issues

  • #19163 — independently confirms litellm.cache.cache pattern still present in proxy_server.py main branch (January 2026, open)
  • #14889 — same LLM Provider NOT provided error for semantic cache embedding models (September 2025, open)
  • #4963 — original Qdrant semantic cache feature request

extent analysis

Fix Plan

To fix the Qdrant semantic cache issues in LiteLLM, follow these steps:

  1. Patch AttributeError locations:

    • In litellm/proxy/proxy_server.py, replace litellm.cache.cache with getattr(litellm.cache, "cache", None) to safely handle the case where cache attribute does not exist.
    • In litellm/caching/caching_handler.py, make the same replacement as above.
    • In litellm/caching/caching.py, replace self.cache.cache with getattr(self.cache, "cache", None).
  2. Fix embedding provider resolution:

    • In litellm/caching/qdrant_semantic_cache.py, modify the else branch of the embedding model resolution to pass api_base when available:

else: embedding_response = await litellm.aembedding( model=self.embedding_model, input=prompt, cache={"no-store": True, "no-cache": True}, api_base=getattr(self, "embed_api_base", None), )


3. **Correct configuration**:
   - Ensure `cache` is nested under `litellm_settings` in `config.yaml`:
     ```yaml
litellm_settings:
  cache: true
  cache_params:
    type: qdrant-semantic
    # ... other cache params
  • Use the correct parameter names:
    • type: qdrant-semantic instead of type: qdrant
    • qdrant_api_base instead of qdrant_url
    • qdrant_semantic_cache_vector_size instead of qdrant_vector_size

Verification

After applying these fixes, verify that:

  • The LiteLLM proxy starts without crashing.
  • Cache reads and writes are successful for semantically similar requests.
  • The response time for subsequent similar requests is significantly reduced (sub-100ms).

Extra Tips

  • Regularly review the LiteLLM documentation and source code for updates and corrections to configuration parameters and caching behavior.
  • Monitor the performance of the Qdrant semantic cache and adjust parameters as needed to optimize response times and cache hit rates.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING