dify - ✅(Solved) Fix Event Bus silently routes to the wrong Redis when REDIS_USE_SENTINEL=true [2 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
langgenius/dify#35480Fetched 2026-04-23 07:45:35
View on GitHub
Comments
0
Participants
1
Timeline
6
Reactions
1
Author
Participants
Assignees
Timeline (top)
cross-referenced ×3assigned ×1closed ×1labeled ×1

Error Message

In both cases there is no startup error — the misrouting surfaces

Root Cause

  • Helm default renders REDIS_HOST to the Sentinel service host
    (port 26379). The Event Bus client tries to speak Redis protocol
    against the Sentinel port, connection fails or misbehaves, SSE
    streams never flow.
    • Manually pinned REDIS_HOST=<master IP>: Event Bus connects
      successfully, but after a Sentinel failover it keeps talking to
      the old (now replica / offline) master because the client was
      never told about Sentinel. Streaming breaks on every failover
      until pods restart.

Fix Action

Fix / Workaround

Workaround

PR fix notes

PR #35483: feat(redis): Event Bus follows main Redis mode; guard Celery broker against Cluster URLs

Description (problem / solution / changelog)

Fixes #35480

Summary

Add two config-layer guardrails so Redis-adjacent subsystems stay aligned with the declared deployment mode.

  1. Event Bus follows the main Redis mode.
    _build_default_pubsub_url now branches on REDIS_USE_SENTINEL and REDIS_USE_CLUSTERS:

    • Sentinel mode returns None. ext_redis.init_app keeps its
      default _pubsub_redis_client = client assignment, so the Event
      Bus reuses the main Sentinel.master_for(...) handle and
      inherits failover for free. Previously the builder fell through
      to the standalone branch and silently produced a URL from
      REDIS_HOST:REDIS_PORT — either unreachable (when Helm renders
      REDIS_HOST to the Sentinel service address) or pinned to the
      master IP at startup with no failover thereafter.
    • Cluster mode builds a seed URL from the first
      REDIS_CLUSTERS entry. RedisCluster.from_url uses it only
      for bootstrap discovery and then talks to every shard directly.
      Before this change, Cluster-only deployments without an explicit
      PUBSUB_REDIS_URL raised ValueError on boot.
    • Sentinel + Cluster both enabled is ambiguous — raise
      ValueError telling the operator to set PUBSUB_REDIS_URL
      explicitly.
    • _format_netloc is extracted so the Cluster and standalone
      branches cannot drift on userinfo encoding.
  2. Celery broker rejects Cluster URLs at startup.
    ext_celery._reject_cluster_broker_url scans CELERY_BROKER_URL
    and CELERY_RESULT_BACKEND (lowercased, whitespace-stripped) for
    the redis+cluster:// / rediss+cluster:// prefixes and raises
    with a clear message naming the offending env var and pointing at
    standalone / Sentinel as the supported options. Kombu's Redis
    transport does not support Redis Cluster (BRPOP, ETA ZSets, and
    the unacked Hash all assume a single slot), so catching this at
    startup avoids an opaque failure on the first task enqueue.

Hardening

  • _first_cluster_node's malformed-entry error reports the 1-based
    position instead of echoing the raw entry, so a mistakenly-pasted DSN (with password) never reaches startup logs.
  • IPv6 note: REDIS_CLUSTERS entries must use bracketed form for
    IPv6 literals; added an inline reminder near the rpartition(":").
  • 34 unit tests across
    api/tests/unit_tests/configs/middleware/cache/test_redis_pubsub_config.py
    and
    api/tests/unit_tests/extensions/test_celery_broker_validation.py cover the three Event Bus branches, Sentinel + Cluster rejection,
    REDIS_DB=0 URL shape, whitespace handling in both the
    PUBSUB_REDIS_URL override path and the Celery scheme check,
    and the sanitised REDIS_CLUSTERS malformed-entry error.

Screenshots

N/A (backend / config-layer change).

Checklist

  • This change requires a documentation update
  • I understand that this PR may be closed in case there was no previous discussion
  • I've added a test for each change that was introduced
  • I've updated the documentation accordingly
  • I ran make lint && make type-check — pending

Changed files

  • api/configs/middleware/cache/redis_pubsub_config.py (modified, +104/-11)
  • api/extensions/ext_celery.py (modified, +34/-1)
  • api/extensions/ext_redis.py (modified, +12/-0)
  • api/tests/unit_tests/configs/middleware/cache/test_redis_pubsub_config.py (added, +307/-0)
  • api/tests/unit_tests/extensions/test_celery_broker_validation.py (added, +80/-0)

PR #35515: refactor(redis): unify main + pub/sub client construction via RedisConnectionSpec

Description (problem / solution / changelog)

Fixes #35480 Closes #35516

Summary

Replaces the URL-string Redis config contract with a structured RedisConnectionSpec dataclass so the main client and the Event Bus (pub/sub) client share one construction path across all three topologies. Addresses #35480 at the contract level; see #35483 for the minimal-invasion alternative for the same root cause.

Why the URL contract was the wrong abstraction

A URL cannot encode Sentinel topology (list of sentinel nodes + service name + sentinel credentials). Under Sentinel, pub/sub silently mis-routed to REDIS_HOST; under Cluster, it hit a hard ValueError at startup. Pub/sub could not participate in Sentinel HA under any configuration.

What this PR does

  • New RedisConnectionSpec (api/configs/middleware/cache/redis_connection_spec.py) — frozen dataclass covering all three topologies. Masks passwords in __repr__; hashable so pubsub_spec == main_spec comparison drives the "reuse main client" decision.
  • build_main_redis_spec / build_pubsub_spec factories read env into specs; cross-validate invalid combinations (e.g. REDIS_USE_SENTINEL + REDIS_USE_CLUSTERS) at the config layer.
  • ext_redis._create_*_client now take (spec, transport) — topology from the spec, tuning knobs from a consolidated RedisTransportParamsDict (replaces three parallel dicts).
  • ext_redis.init_app reuses the main client object when pubsub_spec == main_spec (the default "inherit" path), otherwise builds a dedicated client via the same factory.
  • ext_celery._reject_cluster_broker_url — startup guard that fails fast when CELERY_BROKER_URL / CELERY_RESULT_BACKEND points at redis+cluster:// (Kombu limitation).
  • Hardening: malformed entries in REDIS_SENTINELS / REDIS_CLUSTERS / PUBSUB_REDIS_SENTINELS / PUBSUB_REDIS_CLUSTERS report the 1-based position (never the raw content), so a password accidentally pasted into these env vars does not leak into startup logs.

Unlocked capabilities

  • Pub/sub under Sentinel HA — works by default via main-client reuse, or via PUBSUB_REDIS_MODE=sentinel for an independent Sentinel.
  • Pub/sub under Cluster — works by default via main-client reuse.
  • Startup rejection for invalid combinations, surfaced with clear, env-var-named error messages.

⚠️ Breaking change

PUBSUB_REDIS_URL / EVENT_BUS_REDIS_URL and PUBSUB_REDIS_USE_CLUSTERS / EVENT_BUS_REDIS_USE_CLUSTERS are removed. All three deployment templates shipped these with empty defaults, so most deployments will silently fall back to the inherit path and start working correctly under Sentinel for the first time. Deployments that genuinely want an independent pub/sub Redis should migrate to PUBSUB_REDIS_MODE plus the structured PUBSUB_REDIS_* fields (which, unlike the URL, can express independent Sentinel topologies).

Tests

51 new unit tests across four files covering spec construction, builder cross-validation, password-redaction in __repr__, inheritance semantics, independent Sentinel pub/sub, and the Celery Cluster guard. Full local run: 11723 passed, 4 skipped, 0 failed.

Screenshots

N/A (backend / configuration layer change).

Checklist

  • This change requires a documentation update, included: Dify Document
  • I understand that this PR may be closed in case there was no previous discussion or issues.
  • I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
  • I've updated the documentation accordingly.
  • I ran make lint && make type-check (backend) and cd web && pnpm exec vp staged (frontend) to appease the lint gods.

Changed files

  • api/.env.example (modified, +6/-5)
  • api/configs/middleware/cache/redis_connection_spec.py (added, +386/-0)
  • api/configs/middleware/cache/redis_pubsub_config.py (modified, +123/-60)
  • api/extensions/ext_celery.py (modified, +36/-1)
  • api/extensions/ext_redis.py (modified, +148/-157)
  • api/tests/unit_tests/configs/middleware/cache/test_build_main_redis_spec.py (added, +263/-0)
  • api/tests/unit_tests/configs/middleware/cache/test_build_pubsub_spec.py (added, +210/-0)
  • api/tests/unit_tests/configs/middleware/cache/test_redis_connection_spec.py (added, +236/-0)
  • api/tests/unit_tests/configs/test_dify_config.py (modified, +113/-10)
  • api/tests/unit_tests/extensions/test_celery_broker_validation.py (added, +89/-0)
  • api/tests/unit_tests/extensions/test_redis.py (modified, +19/-43)
  • docker/.env.example (modified, +6/-5)
  • docker/docker-compose.yaml (modified, +0/-1)

Code Example

REDIS_USE_SENTINEL=true                                   
  REDIS_SENTINELS=sentinel-a:26379,sentinel-b:26379,sentinel-c:26379                                                                                                                                                                                                            
  REDIS_SENTINEL_SERVICE_NAME=mymaster                                                                                                                                                                                                                                          
  REDIS_SENTINEL_PASSWORD=...                                                                                                                                                                                                                                                   
  REDIS_PASSWORD=...          
                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                
  # Helm renders this unconditionally; it may point at the Sentinel
  # service hostname or a placeholder rather than the current master.                                                                                                                                                                                                           
  REDIS_HOST=redis-sentinel                                                                                                                                                                                                                                                     
  REDIS_PORT=26379
RAW_BUFFERClick to expand / collapse

Self Checks

  • I have read the Contributing Guide and Language Policy.
  • This is only for bug report, if you would like to ask a question, please head to Discussions.
  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report, otherwise it will be closed.
  • 【中文用户 & Non English User】请使用英语提交,否则会被关闭 :)
  • Please do not modify this template :) and fill in all the required fields.

Dify version

latest

Cloud or Self Hosted

Self Hosted (Docker), Self Hosted (Source)

Steps to reproduce

Deploy dify-api with a Sentinel-backed Redis and without an explicit PUBSUB_REDIS_URL / EVENT_BUS_REDIS_URL. A minimal env:

REDIS_USE_SENTINEL=true                                   
REDIS_SENTINELS=sentinel-a:26379,sentinel-b:26379,sentinel-c:26379                                                                                                                                                                                                            
REDIS_SENTINEL_SERVICE_NAME=mymaster                                                                                                                                                                                                                                          
REDIS_SENTINEL_PASSWORD=...                                                                                                                                                                                                                                                   
REDIS_PASSWORD=...          
                                                                                                                                                                                                                                          
                                                                                                                                                                                                                                                                              
# Helm renders this unconditionally; it may point at the Sentinel
# service hostname or a placeholder rather than the current master.                                                                                                                                                                                                           
REDIS_HOST=redis-sentinel                                                                                                                                                                                                                                                     
REDIS_PORT=26379

Boot the API container and invoke any SSE-producing endpoint (chat
streaming, workflow run). Watch the Event Bus client's behavior via
docker logs or a Sentinel failover drill.

✔️ Expected Behavior

With REDIS_USE_SENTINEL=true and no explicit PUBSUB_REDIS_URL,
the Event Bus publisher / subscriber should reuse the main Redis client — which is a redis.Redis handle returned by
Sentinel.master_for(...). That client:

  1. Connects to the current master discovered via Sentinel, not to
    REDIS_HOST as a plain Redis address.
  2. Re-discovers the master after a Sentinel failover.

❌ Actual Behavior

api/configs/middleware/cache/redis_pubsub_config.py::RedisPubSubConfig._build_default_pubsub_url
does not branch on REDIS_USE_SENTINEL. When PUBSUB_REDIS_URL is empty it always falls through to:

netloc = f"{userinfo}{defaults.REDIS_HOST}:{defaults.REDIS_PORT}"
return urlunparse((scheme, netloc, f"/{db}", "", "", ""))

So ext_redis.init_app overrides the Sentinel-aware client with a
plain redis.Redis.from_url(...) built from REDIS_HOST:REDIS_PORT.
Two failure modes follow, both silent:

  • Helm default renders REDIS_HOST to the Sentinel service host
    (port 26379). The Event Bus client tries to speak Redis protocol
    against the Sentinel port, connection fails or misbehaves, SSE
    streams never flow.
  • Manually pinned REDIS_HOST=<master IP>: Event Bus connects
    successfully, but after a Sentinel failover it keeps talking to
    the old (now replica / offline) master because the client was
    never told about Sentinel. Streaming breaks on every failover
    until pods restart.

In both cases there is no startup error — the misrouting surfaces
only at runtime, and in the failover case only during the next Sentinel drill.

Workaround

Set PUBSUB_REDIS_URL (or its alias EVENT_BUS_REDIS_URL)
explicitly. This bypasses the default builder but still does not give the Event Bus Sentinel failover unless you point it at an
independent non-Sentinel Redis, which defeats the whole purpose of
running the main Redis in Sentinel mode.

Proposed Fix

Make _build_default_pubsub_url return None in pure Sentinel mode
so ext_redis.init_app keeps the _pubsub_redis_client = client default — reusing the Sentinel.master_for handle and inheriting
failover for free. Track this with the companion PR.

extent analysis

TL;DR

The most likely fix is to modify the _build_default_pubsub_url function to return None when REDIS_USE_SENTINEL is true, allowing the Event Bus client to reuse the Sentinel-aware Redis client.

Guidance

  • Verify that the REDIS_USE_SENTINEL environment variable is set to true and that the PUBSUB_REDIS_URL is not explicitly set.
  • Check the api/configs/middleware/cache/redis_pubsub_config.py file to ensure that the _build_default_pubsub_url function is modified to return None when REDIS_USE_SENTINEL is true.
  • Test the Event Bus client's behavior after a Sentinel failover to ensure that it correctly reconnects to the new master node.
  • Consider setting PUBSUB_REDIS_URL explicitly as a temporary workaround, but note that this may not provide the desired Sentinel failover behavior.

Example

def _build_default_pubsub_url(self):
    if REDIS_USE_SENTINEL:
        return None
    # ... existing code ...

Notes

The proposed fix assumes that the Sentinel.master_for handle is correctly configured to reconnect to the new master node after a failover. Additional testing may be necessary to ensure that this behavior works as expected.

Recommendation

Apply the proposed fix by modifying the _build_default_pubsub_url function to return None when REDIS_USE_SENTINEL is true, allowing the Event Bus client to reuse the Sentinel-aware Redis client. This should provide the correct Sentinel failover behavior without requiring an explicit PUBSUB_REDIS_URL setting.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING