litellm - 💡(How to fix) Fix [Bug]: GCP IAM auth fails with Redis Cluster — sync path uses redis_connect_func which RedisCluster bootstrap ignores

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

redis.exceptions.RedisClusterException: Redis Cluster cannot be connected. Please provide at least one reachable node: Authentication required.

Root Cause

In litellm/_redis.py::_get_redis_client_logic, the GCP IAM branch installs a custom connection callback via redis_connect_func:

if _gcp_service_account is not None:
    redis_kwargs["redis_connect_func"] = create_gcp_iam_redis_connect_func(
        service_account=_gcp_service_account, ssl_ca_certs=_gcp_ssl_ca_certs
    )
    redis_kwargs["redis_connect_func"]._gcp_service_account = _gcp_service_account

redis_connect_func is honored by redis.Redis connections, but redis.RedisCluster's bootstrap (NodesManager.initialize()) does not invoke it — the bootstrap connection runs CLUSTER SLOTS immediately, before any user-supplied hook can authenticate it. The server (Memorystore Valkey here) replies with Authentication required, which init_redis_cluster surfaces as the exception above.

The async path (get_redis_async_client) already handles this correctly:

if redis_connect_func and hasattr(redis_connect_func, "_gcp_service_account"):
    cluster_kwargs["credential_provider"] = GCPIAMCredentialProvider(
        redis_connect_func._gcp_service_account
    )

It uses credential_provider, which is honored by RedisCluster (redis-py passes it through to every connection in the cluster, including bootstrap ones).

Fix Action

Fix / Workaround

Workaround in use

Two-line monkey-patch in sitecustomize.py:

_orig_kw = r._get_redis_cluster_kwargs def _patched_kw(*a, **k): s = _orig_kw(*a, **k); s.add("credential_provider"); return s r._get_redis_cluster_kwargs = _patched_kw

Code Example

redis.exceptions.RedisClusterException: Redis Cluster cannot be connected. Please provide at least one reachable node: Authentication required.

---

if _gcp_service_account is not None:
    redis_kwargs["redis_connect_func"] = create_gcp_iam_redis_connect_func(
        service_account=_gcp_service_account, ssl_ca_certs=_gcp_ssl_ca_certs
    )
    redis_kwargs["redis_connect_func"]._gcp_service_account = _gcp_service_account

---

if redis_connect_func and hasattr(redis_connect_func, "_gcp_service_account"):
    cluster_kwargs["credential_provider"] = GCPIAMCredentialProvider(
        redis_connect_func._gcp_service_account
    )

---

import litellm._redis as r
from litellm._redis_credential_provider import GCPIAMCredentialProvider

_orig_kw = r._get_redis_cluster_kwargs
def _patched_kw(*a, **k):
    s = _orig_kw(*a, **k); s.add("credential_provider"); return s
r._get_redis_cluster_kwargs = _patched_kw

_orig_logic = r._get_redis_client_logic
def _patched_logic(**kwargs):
    kw = _orig_logic(**kwargs)
    rcf = kw.get("redis_connect_func")
    gsa = getattr(rcf, "_gcp_service_account", None) if rcf else None
    if gsa:
        kw.pop("redis_connect_func", None)
        kw["credential_provider"] = GCPIAMCredentialProvider(gsa)
    return kw
r._get_redis_client_logic = _patched_logic

---

litellm_settings:
    cache: true
    cache_params:
      type: redis
      redis_startup_nodes:
        - host: <discovery-endpoint-ip>
          port: 6379
      gcp_service_account: <gsa-email>
      ssl: true
      ssl_ca_certs: /path/to/memorystore-ca-bundle.crt

---
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

Configuring GCP IAM authentication (via cache_params.gcp_service_account or REDIS_GCP_SERVICE_ACCOUNT env) against a clustered Redis (e.g., Google Memorystore for Valkey, which is always cluster-mode) causes the proxy to crashloop at startup with:

redis.exceptions.RedisClusterException: Redis Cluster cannot be connected. Please provide at least one reachable node: Authentication required.

The same setup works correctly without GCP IAM when a static password is provided via cache_params.password.

Root cause

In litellm/_redis.py::_get_redis_client_logic, the GCP IAM branch installs a custom connection callback via redis_connect_func:

if _gcp_service_account is not None:
    redis_kwargs["redis_connect_func"] = create_gcp_iam_redis_connect_func(
        service_account=_gcp_service_account, ssl_ca_certs=_gcp_ssl_ca_certs
    )
    redis_kwargs["redis_connect_func"]._gcp_service_account = _gcp_service_account

redis_connect_func is honored by redis.Redis connections, but redis.RedisCluster's bootstrap (NodesManager.initialize()) does not invoke it — the bootstrap connection runs CLUSTER SLOTS immediately, before any user-supplied hook can authenticate it. The server (Memorystore Valkey here) replies with Authentication required, which init_redis_cluster surfaces as the exception above.

The async path (get_redis_async_client) already handles this correctly:

if redis_connect_func and hasattr(redis_connect_func, "_gcp_service_account"):
    cluster_kwargs["credential_provider"] = GCPIAMCredentialProvider(
        redis_connect_func._gcp_service_account
    )

It uses credential_provider, which is honored by RedisCluster (redis-py passes it through to every connection in the cluster, including bootstrap ones).

Suggested fix

For sync clusters, mirror the async path: when GCP IAM is configured and startup_nodes is set, swap redis_connect_func for credential_provider=GCPIAMCredentialProvider(_gcp_service_account).

Additionally, _get_redis_cluster_kwargs() builds its allowlist from inspect.getfullargspec(redis.RedisCluster), but redis.RedisCluster.__init__ is defined as def __init__(self, **kwargs), so getfullargspec().args returns an empty list — the only kwargs allowed through are the ones explicitly added to the available_args |= {...} set. credential_provider needs to be added to that set, otherwise the kwarg is silently dropped before reaching redis.RedisCluster(...).

Workaround in use

Two-line monkey-patch in sitecustomize.py:

import litellm._redis as r
from litellm._redis_credential_provider import GCPIAMCredentialProvider

_orig_kw = r._get_redis_cluster_kwargs
def _patched_kw(*a, **k):
    s = _orig_kw(*a, **k); s.add("credential_provider"); return s
r._get_redis_cluster_kwargs = _patched_kw

_orig_logic = r._get_redis_client_logic
def _patched_logic(**kwargs):
    kw = _orig_logic(**kwargs)
    rcf = kw.get("redis_connect_func")
    gsa = getattr(rcf, "_gcp_service_account", None) if rcf else None
    if gsa:
        kw.pop("redis_connect_func", None)
        kw["credential_provider"] = GCPIAMCredentialProvider(gsa)
    return kw
r._get_redis_client_logic = _patched_logic

After this patch, cluster bootstrap, sync cache writes, async cache reads, and the routing-side Redis client all work end-to-end against Memorystore Valkey IAM.

Environment

  • LiteLLM base image: docker.litellm.ai/berriai/litellm-database:main-stable
  • redis-py: 5.x (whatever ships with the base image)
  • Python: 3.13
  • Redis backend: Google Memorystore for Valkey, shardCount=1, replicaCount=0/1, IAM_AUTH + SERVER_AUTHENTICATION

Steps to Reproduce

Setup:

  • GKE Autopilot, Workload Identity binding to a GSA with roles/iam.serviceAccountTokenCreator on itself
  • Memorystore for Valkey (mode: CLUSTER, authorizationMode: IAM_AUTH, transitEncryptionMode: SERVER_AUTHENTICATION)
  • LiteLLM proxy with:
    litellm_settings:
      cache: true
      cache_params:
        type: redis
        redis_startup_nodes:
          - host: <discovery-endpoint-ip>
            port: 6379
        gcp_service_account: <gsa-email>
        ssl: true
        ssl_ca_certs: /path/to/memorystore-ca-bundle.crt

Pod crashloops at boot with RedisClusterException: ...: Authentication required.

Replacing gcp_service_account with password: <static-iam-token> (e.g., from gcloud auth print-access-token for the GSA, refreshed via a sidecar) makes the same setup work — confirming the wire-level AUTH protocol is fine; the issue is purely that redis_connect_func never runs on the bootstrap connection.

Relevant log output

What part of LiteLLM is this about?

SDK (litellm Python package)

What LiteLLM version are you on ?

v.1.85.0

Twitter / LinkedIn details

No response

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - 💡(How to fix) Fix [Bug]: GCP IAM auth fails with Redis Cluster — sync path uses redis_connect_func which RedisCluster bootstrap ignores