dify - 💡(How to fix) Fix [Bug] 1.14.0 trigger_provider_refresh task storm causes 100% CPU on worker even with zero providers [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
langgenius/dify#36108Fetched 2026-05-14 03:46:45
View on GitHub
Comments
1
Participants
2
Timeline
3
Reactions
1
Timeline (top)
labeled ×2commented ×1

Root Cause

Additional context / Root cause analysis In 1.14.0 a new dedicated Celery queue trigger_refresh_publisher was introduced. The trigger_provider_refresh task explicitly sets queue="trigger_refresh_publisher". The worker’s default queue list (set in entrypoint.sh) includes both trigger_refresh_publisher and trigger_refresh_executor.

RAW_BUFFERClick to expand / collapse

Self Checks

  • I have read the Contributing Guide and Language Policy.
  • This is only for bug report, if you would like to ask a question, please head to Discussions.
  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report, otherwise it will be closed.
  • 【中文用户 & Non English User】请使用英语提交,否则会被关闭 :)
  • Please do not modify this template :) and fill in all the required fields.

Dify version

1.14.0 or 1.14.1

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

Describe the bug After upgrading to Dify 1.14.0 , the docker-worker-1 container immediately enters an infinite task-storm of trigger_provider_refresh. The task generates thousands of logs per minute (“Trigger refresh scan start: due=0 … succeeded”), Redis saves 10,000+ changes every 60 seconds, and worker CPU stays at 99-100%. The problem occurs even when no model providers are configured (providers table is empty).

To Reproduce

  1. Start a fresh Dify 1.14.0 deployment (or upgrade from 1.13.3) without configuring any model providers.
  2. Wait for the infrastructure to become healthy.
  3. Observe the logs of the worker container (docker logs dify-worker-1).
  4. See continuous, high-frequency execution of schedule.trigger_provider_refresh_task.trigger_provider_refresh.
  5. Monitor CPU usage (docker stats) – the worker container will use 100% of one CPU core indefinitely.

✔️ Expected Behavior

When no providers are configured or no subscriptions are due, the task should either not be triggered at all, or it should run once and remain silent until the next scheduled beat interval. It should not cause a self-sustaining storm.

❌ Actual Behavior

Additional context / Root cause analysis In 1.14.0 a new dedicated Celery queue trigger_refresh_publisher was introduced. The trigger_provider_refresh task explicitly sets queue="trigger_refresh_publisher". The worker’s default queue list (set in entrypoint.sh) includes both trigger_refresh_publisher and trigger_refresh_executor.

We found that even when the database has zero providers (i.e. due=0 and the function returns immediately), the task keeps getting re-published to trigger_refresh_publisher, creating an endless chain. This did not happen in 1.13.3 where only trigger_refresh_executor existed.

By removing only trigger_refresh_publisher from the queues that the worker listens to (while keeping trigger_refresh_executor), the storm stops completely and CPU returns to normal. This aligns with the 1.13.3 behavior which was stable.

Additionally, the ENABLE_TRIGGER_PROVIDER_REFRESH_TASK environment variable can be used to disable the beat schedule entirely, but the self-re-publishing issue remains a bug in the task itself.

Suggested fixes

  1. Prevent self-re‑publishing – Investigate why trigger_provider_refresh is re-sent to the trigger_refresh_publisher queue even after a successful early return. There may be an erroneous apply_async/delay call or a Celery signal handler.
  2. Add a guard condition – Before entering the scan loop, check if any provider credentials exist at all. If not, skip the entire refresh and do not re-publish the task.
  3. Update default worker queues – Consider removing trigger_refresh_publisher from the default worker queue list in entrypoint.sh (similar to 1.13.3). The publisher/executor split can be kept, but only a dedicated worker should consume the publisher queue to avoid accidental storms.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING