litellm - ✅(Solved) Fix AttributeError: 'Cache' object has no attribute 'cache' — Qdrant semantic cache completely non-functional due to multiple bugs [1 pull requests, 1 comments, 2 participants]

litellm2026-03-12 09:34:37

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#23441•Fetched 2026-04-08 00:36:48

View on GitHub

Comments

Participants

Timeline

Reactions

Author

TyroneNel

Participants

Drolla

TyroneNel

Timeline (top)

commented ×1labeled ×1

When configuring LiteLLM proxy with type: qdrant-semantic cache, the proxy crashes on startup and silently fails on every subsequent request due to AttributeError: 'Cache' object has no attribute 'cache' in multiple source files. This is because the code assumes all cache backends expose an inner .cache attribute (which Redis-based backends do), but QdrantSemanticCache does not — it is the cache object itself, stored as self.cache on the outer Cache wrapper.

There are also related configuration and embedding routing issues that make Qdrant semantic cache impossible to configure correctly without reading source code.

These bugs are confirmed present in the current main branch as of March 2026. Issue #19163 (filed January 2026) independently confirms the litellm.cache.cache pattern is still in proxy_server.py at ~line 1987, and issue #14889 (filed September 2025) confirms the embedding provider routing failure is a known open issue.

Error Message

AttributeError: 'Cache' object has no attribute 'cache'

File ".../litellm/proxy/proxy_server.py", line 2458, in _init_cache litellm.cache.cache, (RedisCache, RedisClusterCache) ^^^^^^^^^^^^^^^^^^^

Root Cause

Cache.__init__ sets self.cache = QdrantSemanticCache(...) for the qdrant-semantic type (line 204 of caching.py). However, multiple code paths outside caching.py access litellm.cache.cache directly, assuming it always exists. For Redis this works because Cache.__init__ always sets self.cache. For Qdrant, if init fails at any point due to Bug 1 crashing before line 204 is reached, self.cache is never set, causing cascading failures across all downstream cache paths.

Fix Action

Fix / Workaround

Proxy crashes on startup with AttributeError: 'Cache' object has no attribute 'cache'
After patching the startup crash, every chat completion request fails with the same error at request time
After patching that, cache reads silently return None with caching=False logged on every request
After fixing the caching=False issue, cache writes fail with LLM Provider NOT provided because the embedding model name is passed without its provider prefix to litellm.aembedding()

Workaround / Patch Applied

The following script patches all three AttributeError locations in place:

PR fix notes

PR #25556: fix(caching): fix AttributeError crashes and embedding fallback for Qdrant semantic cache

Repository: BerriAI/litellm
Author: vedaant00
State: open | merged: False
Link: https://github.com/BerriAI/litellm/pull/25556

Description (problem / solution / changelog)

Summary

Qdrant semantic cache is completely non-functional due to 4 cascading bugs. Multiple code paths access litellm.cache.cache directly, which only exists for Redis-based backends. For Qdrant, this raises AttributeError: 'Cache' object has no attribute 'cache' at startup and on every request.

Changes

proxy_server.py: Use getattr(litellm.cache, "cache", None) instead of litellm.cache.cache when checking for Redis usage cache
caching_handler.py: Same fix for RedisCache and S3Cache isinstance checks
caching.py + qdrant_semantic_cache.py: Add embed_api_base parameter so the fallback litellm.aembedding() call can pass api_base for non-OpenAI embedding models (prevents "LLM Provider NOT provided" error)

Tests

3 new tests in tests/test_litellm/caching/test_qdrant_semantic_cache.py:

test_proxy_init_cache_does_not_crash_on_non_redis_cache
test_caching_handler_does_not_crash_on_non_redis_cache
test_qdrant_semantic_cache_embed_api_base

All 3 pass. The 3 pre-existing async test failures (test_qdrant_semantic_cache_async_*) are unrelated and fail on main as well due to missing proxy dependencies in the unit test environment.

Closes #23441 Related: #19163, #14889

Changed files

litellm/caching/caching.py (modified, +2/-0)
litellm/caching/caching_handler.py (modified, +5/-4)
litellm/caching/qdrant_semantic_cache.py (modified, +36/-22)
litellm/proxy/proxy_server.py (modified, +2/-2)
tests/test_litellm/caching/test_qdrant_semantic_cache.py (modified, +74/-0)

Code Example

litellm_settings:
  cache: true
  cache_params:
    type: qdrant-semantic
    qdrant_api_base: "http://<qdrant-host>:6333"
    qdrant_api_key: "<api-key>"
    qdrant_collection_name: "litellm-semantic-cache"
    qdrant_semantic_cache_embedding_model: "nomic-embed-text"
    qdrant_semantic_cache_vector_size: 768
    similarity_threshold: 0.85

model_list:
  - model_name: nomic-embed-text
    litellm_params:
      model: ollama/nomic-embed-text
      api_base: http://<ollama-host>:11434

---

litellm --config config.yaml

---

AttributeError: 'Cache' object has no attribute 'cache'

  File ".../litellm/proxy/proxy_server.py", line 2458, in _init_cache
    litellm.cache.cache, (RedisCache, RedisClusterCache)
    ^^^^^^^^^^^^^^^^^^^

---

# BROKEN — crashes when cache type is qdrant-semantic
if litellm.cache is not None and isinstance(
    litellm.cache.cache, (RedisCache, RedisClusterCache)
):
    redis_usage_cache = litellm.cache.cache

# FIX
if litellm.cache is not None and isinstance(
    getattr(litellm.cache, "cache", None), (RedisCache, RedisClusterCache)
):
    redis_usage_cache = getattr(litellm.cache, "cache", None)

---

AttributeError: 'Cache' object has no attribute 'cache'

  File ".../litellm/caching/caching_handler.py", line 103, in __init__
    if litellm.cache is not None and isinstance(litellm.cache.cache, RedisCache):
                                                ^^^^^^^^^^^^^^^^^^^

---

# BROKEN
if litellm.cache is not None and isinstance(litellm.cache.cache, RedisCache):
    redis_cache=litellm.cache.cache,

# FIX
if litellm.cache is not None and isinstance(getattr(litellm.cache, "cache", None), RedisCache):
    redis_cache=getattr(litellm.cache, "cache", None),

---

LiteLLM Cache: Exception add_cache: 'Cache' object has no attribute 'cache'

  File ".../litellm/caching/caching.py", line 637, in async_add_cache
    await self.cache.async_set_cache(cache_key, cached_data, **kwargs)
          ^^^^^^^^^^

---

# BROKEN — self.cache not set if __init__ failed mid-way
await self.cache.async_set_cache(cache_key, cached_data, **kwargs)

# FIX — same getattr pattern to safely handle missing attribute
inner = getattr(self, "cache", None)
if inner is not None:
    await inner.async_set_cache(cache_key, cached_data, **kwargs)

---

else:
    # convert to embedding
    embedding_response = await litellm.aembedding(
        model=self.embedding_model,   # e.g. "nomic-embed-text" — no provider prefix
        input=prompt,
        cache={"no-store": True, "no-cache": True},
        # no api_base passed!
    )

---

litellm.BadRequestError: LLM Provider NOT provided.
You passed model=nomic-embed-text
Pass model as e.g. completion(model='huggingface/starcoder', ...)

---

else:
    embedding_response = await litellm.aembedding(
        model=self.embedding_model,
        input=prompt,
        cache={"no-store": True, "no-cache": True},
        api_base=getattr(self, "embed_api_base", None),  # add this
    )

---

import re

patches = [
    (
        "/path/to/litellm/proxy/proxy_server.py",
        r'\blitellm\.cache\.cache\b',
        'getattr(litellm.cache, "cache", None)'
    ),
    (
        "/path/to/litellm/caching/caching_handler.py",
        r'\blitellm\.cache\.cache\b',
        'getattr(litellm.cache, "cache", None)'
    ),
    (
        "/path/to/litellm/caching/caching.py",
        r'\bself\.cache\.cache\b',
        'getattr(self.cache, "cache", None)'
    ),
]

for path, pattern, replacement in patches:
    with open(path, 'r') as f:
        content = f.read()
    new_content = re.sub(pattern, replacement, content)
    if new_content != content:
        with open(path, 'w') as f:
            f.write(new_content)
        print(f"✅ Patched: {path}")
    else:
        print(f"⚠️  No changes needed: {path}")

---

litellm_settings:
  drop_params: true
  cache: true
  cache_params:
    type: qdrant-semantic
    qdrant_api_base: "http://<qdrant-host>:6333"
    qdrant_api_key: "<api-key>"
    qdrant_collection_name: "litellm-semantic-cache"
    qdrant_semantic_cache_embedding_model: "nomic-embed-text"
    qdrant_semantic_cache_vector_size: 768
    similarity_threshold: 0.85

model_list:
  - model_name: nomic-embed-text
    litellm_params:
      model: ollama/nomic-embed-text
      api_base: http://<ollama-host>:11434

RAW_BUFFERClick to expand / collapse

Bug Report: `AttributeError: 'Cache' object has no attribute 'cache'` — Qdrant Semantic Cache Completely Non-Functional

Summary

There are also related configuration and embedding routing issues that make Qdrant semantic cache impossible to configure correctly without reading source code.

Environment

Field	Value
LiteLLM Version	`1.81.16` (bugs confirmed present in `main` branch)
Python Version	`3.13.12`
OS	Operating System: Debian GNU/Linux 13 (trixie)
Install method	`uv` venv at `/opt/litellm`
Qdrant	Self-hosted Qdrant v1.17.0
Embedding model	Self-hosted `nomic-embed-text` via Ollama
Chat model	`nvidia/nemotron-3-nano-30b-a3b` via OpenRouter

To Reproduce

Configure LiteLLM proxy config.yaml:

litellm_settings:
  cache: true
  cache_params:
    type: qdrant-semantic
    qdrant_api_base: "http://<qdrant-host>:6333"
    qdrant_api_key: "<api-key>"
    qdrant_collection_name: "litellm-semantic-cache"
    qdrant_semantic_cache_embedding_model: "nomic-embed-text"
    qdrant_semantic_cache_vector_size: 768
    similarity_threshold: 0.85

model_list:
  - model_name: nomic-embed-text
    litellm_params:
      model: ollama/nomic-embed-text
      api_base: http://<ollama-host>:11434

Start the proxy:

litellm --config config.yaml

Observed failure sequence:

Proxy crashes on startup with AttributeError: 'Cache' object has no attribute 'cache'
After patching the startup crash, every chat completion request fails with the same error at request time
After patching that, cache reads silently return None with caching=False logged on every request
After fixing the caching=False issue, cache writes fail with LLM Provider NOT provided because the embedding model name is passed without its provider prefix to litellm.aembedding()

Expected Behavior

Proxy starts successfully, chat completions are written to Qdrant on first call, and subsequent semantically similar calls are served from cache with x-litellm-cache-key response header and sub-100ms response times (vs ~1600ms without cache).

Actual Behavior

Four separate bugs prevent Qdrant semantic cache from working at all, each silently blocking the next.

Bug Details

Bug 1 — Startup Crash

File: litellm/proxy/proxy_server.py ~line 1987 Confirmed in main branch via issue #19163

AttributeError: 'Cache' object has no attribute 'cache'

  File ".../litellm/proxy/proxy_server.py", line 2458, in _init_cache
    litellm.cache.cache, (RedisCache, RedisClusterCache)
    ^^^^^^^^^^^^^^^^^^^

# BROKEN — crashes when cache type is qdrant-semantic
if litellm.cache is not None and isinstance(
    litellm.cache.cache, (RedisCache, RedisClusterCache)
):
    redis_usage_cache = litellm.cache.cache

# FIX
if litellm.cache is not None and isinstance(
    getattr(litellm.cache, "cache", None), (RedisCache, RedisClusterCache)
):
    redis_usage_cache = getattr(litellm.cache, "cache", None)

Bug 2 — Request-Time Crash

File: litellm/caching/caching_handler.py ~line 103

AttributeError: 'Cache' object has no attribute 'cache'

  File ".../litellm/caching/caching_handler.py", line 103, in __init__
    if litellm.cache is not None and isinstance(litellm.cache.cache, RedisCache):
                                                ^^^^^^^^^^^^^^^^^^^

# BROKEN
if litellm.cache is not None and isinstance(litellm.cache.cache, RedisCache):
    redis_cache=litellm.cache.cache,

# FIX
if litellm.cache is not None and isinstance(getattr(litellm.cache, "cache", None), RedisCache):
    redis_cache=getattr(litellm.cache, "cache", None),

Bug 3 — Silent Cache Skip on Every Request

File: litellm/caching/caching.py ~line 637

LiteLLM Cache: Exception add_cache: 'Cache' object has no attribute 'cache'

  File ".../litellm/caching/caching.py", line 637, in async_add_cache
    await self.cache.async_set_cache(cache_key, cached_data, **kwargs)
          ^^^^^^^^^^

The async_add_cache method calls self.cache.async_set_cache(...). For Qdrant, Cache.__init__ sets self.cache = QdrantSemanticCache(...) at line 204 — but if this was never reached due to Bug 1 crashing earlier, self.cache is never set. The error is caught and swallowed internally, so the proxy continues serving requests while silently never writing to cache.

# BROKEN — self.cache not set if __init__ failed mid-way
await self.cache.async_set_cache(cache_key, cached_data, **kwargs)

# FIX — same getattr pattern to safely handle missing attribute
inner = getattr(self, "cache", None)
if inner is not None:
    await inner.async_set_cache(cache_key, cached_data, **kwargs)

Bug 4 — Embedding Provider Not Resolved for Cache Vectorization

File: litellm/caching/qdrant_semantic_cache.py ~line 315 Related open issue: #14889

When the proxy's llm_router is None or the embedding model name does not match router_model_names, the code falls through to a bare litellm.aembedding() call without an api_base:

else:
    # convert to embedding
    embedding_response = await litellm.aembedding(
        model=self.embedding_model,   # e.g. "nomic-embed-text" — no provider prefix
        input=prompt,
        cache={"no-store": True, "no-cache": True},
        # no api_base passed!
    )

This raises:

litellm.BadRequestError: LLM Provider NOT provided.
You passed model=nomic-embed-text
Pass model as e.g. completion(model='huggingface/starcoder', ...)

The router path at line 298 is the correct path and works when llm_router is initialized and the model name exactly matches a model_name in the model_list. The fallback else branch has no api_base and will always fail for non-OpenAI embedding models.

Fix: The else branch should pass api_base when available:

else:
    embedding_response = await litellm.aembedding(
        model=self.embedding_model,
        input=prompt,
        cache={"no-store": True, "no-cache": True},
        api_base=getattr(self, "embed_api_base", None),  # add this
    )

Root Cause

Workaround / Patch Applied

The following script patches all three AttributeError locations in place:

import re

patches = [
    (
        "/path/to/litellm/proxy/proxy_server.py",
        r'\blitellm\.cache\.cache\b',
        'getattr(litellm.cache, "cache", None)'
    ),
    (
        "/path/to/litellm/caching/caching_handler.py",
        r'\blitellm\.cache\.cache\b',
        'getattr(litellm.cache, "cache", None)'
    ),
    (
        "/path/to/litellm/caching/caching.py",
        r'\bself\.cache\.cache\b',
        'getattr(self.cache, "cache", None)'
    ),
]

for path, pattern, replacement in patches:
    with open(path, 'r') as f:
        content = f.read()
    new_content = re.sub(pattern, replacement, content)
    if new_content != content:
        with open(path, 'w') as f:
            f.write(new_content)
        print(f"✅ Patched: {path}")
    else:
        print(f"⚠️  No changes needed: {path}")

Working config after all patches applied:

litellm_settings:
  drop_params: true
  cache: true
  cache_params:
    type: qdrant-semantic
    qdrant_api_base: "http://<qdrant-host>:6333"
    qdrant_api_key: "<api-key>"
    qdrant_collection_name: "litellm-semantic-cache"
    qdrant_semantic_cache_embedding_model: "nomic-embed-text"
    qdrant_semantic_cache_vector_size: 768
    similarity_threshold: 0.85

model_list:
  - model_name: nomic-embed-text
    litellm_params:
      model: ollama/nomic-embed-text
      api_base: http://<ollama-host>:11434

Configuration Pitfalls / Documentation Gaps

The following parameter names are undocumented or incorrectly named in docs/examples, discoverable only by reading source code:

Wrong / Undocumented	Correct
`type: qdrant`	`type: qdrant-semantic`
`qdrant_url`	`qdrant_api_base`
`qdrant_vector_size`	`qdrant_semantic_cache_vector_size`
`cache: true` at top-level of config	Must be nested under `litellm_settings` to set `caching=True` on router requests — top-level placement results in `caching=False` logged on every request

The last point is particularly subtle: placing cache: true at the top level of config.yaml is shown in some documentation examples and appears to be parsed correctly (confirmed via yaml.safe_load), but it does not trigger cache_responses=True on the router. This causes every request to log ASYNC kwargs[caching]: False and skip the cache entirely regardless of what litellm.cache is set to.

Performance Results After Fix

Call	Prompt	Result	Duration
1	"What is the capital of France?"	MISS — sent to LLM	1658ms
2	"What is the capital of France?"	HIT — served from Qdrant	48ms
3	"What is the capital of France?"	HIT — served from Qdrant	74ms
4	"Which city is the capital of France?"	Semantic HIT	46ms

Qdrant collection auto-created and populated with vectors on first call. ✅

Related Issues

#19163 — independently confirms litellm.cache.cache pattern still present in proxy_server.py main branch (January 2026, open)
#14889 — same LLM Provider NOT provided error for semantic cache embedding models (September 2025, open)
#4963 — original Qdrant semantic cache feature request

extent analysis

Fix Plan

To fix the Qdrant semantic cache issues in LiteLLM, follow these steps:

Patch AttributeError locations:
- In litellm/proxy/proxy_server.py, replace litellm.cache.cache with getattr(litellm.cache, "cache", None) to safely handle the case where cache attribute does not exist.
- In litellm/caching/caching_handler.py, make the same replacement as above.
- In litellm/caching/caching.py, replace self.cache.cache with getattr(self.cache, "cache", None).
Fix embedding provider resolution:
- In litellm/caching/qdrant_semantic_cache.py, modify the else branch of the embedding model resolution to pass api_base when available:

else: embedding_response = await litellm.aembedding( model=self.embedding_model, input=prompt, cache={"no-store": True, "no-cache": True}, api_base=getattr(self, "embed_api_base", None), )


3. **Correct configuration**:
   - Ensure `cache` is nested under `litellm_settings` in `config.yaml`:
     ```yaml
litellm_settings:
  cache: true
  cache_params:
    type: qdrant-semantic
    # ... other cache params

Use the correct parameter names:
- type: qdrant-semantic instead of type: qdrant
- qdrant_api_base instead of qdrant_url
- qdrant_semantic_cache_vector_size instead of qdrant_vector_size

Verification

After applying these fixes, verify that:

The LiteLLM proxy starts without crashing.
Cache reads and writes are successful for semantically similar requests.
The response time for subsequent similar requests is significantly reduced (sub-100ms).

Extra Tips

Regularly review the LiteLLM documentation and source code for updates and corrections to configuration parameters and caching behavior.
Monitor the performance of the Qdrant semantic cache and adjust parameters as needed to optimize response times and cache hit rates.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #network issue #logging issue #authentication issue #prompt issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

litellm - ✅(Solved) Fix AttributeError: 'Cache' object has no attribute 'cache' — Qdrant semantic cache completely non-functional due to multiple bugs [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Workaround / Patch Applied

PR fix notes

PR #25556: fix(caching): fix AttributeError crashes and embedding fallback for Qdrant semantic cache

Description (problem / solution / changelog)

Summary

Changes

Tests

Changed files

Code Example

Bug Report: AttributeError: 'Cache' object has no attribute 'cache' — Qdrant Semantic Cache Completely Non-Functional

Summary

Environment

To Reproduce

Expected Behavior

Actual Behavior

Bug Details

Bug 1 — Startup Crash

Bug 2 — Request-Time Crash

Bug 3 — Silent Cache Skip on Every Request

Bug 4 — Embedding Provider Not Resolved for Cache Vectorization

Root Cause

Workaround / Patch Applied

Configuration Pitfalls / Documentation Gaps

Performance Results After Fix

Related Issues

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING

Bug Report: `AttributeError: 'Cache' object has no attribute 'cache'` — Qdrant Semantic Cache Completely Non-Functional