dify - ✅(Solved) Fix feat(api): add async response_mode for workflow execution [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
langgenius/dify#35327Fetched 2026-04-17 08:56:03
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
1
Author
Participants
Timeline (top)
cross-referenced ×1labeled ×1

Fix Action

Fixed

PR fix notes

PR #35301: feat(api): add async response_mode for workflow execution

Description (problem / solution / changelog)

[!IMPORTANT]

  1. Make sure you have read our contribution guidelines
  2. Ensure there is an associated issue and you have been assigned to it
  3. Use the correct syntax to link this PR: Fixes #<issue number>.

Fixes #35327

Summary

Add response_mode: "async" to POST /v1/workflows/run — a true fire-and-forget mode that returns 202 Accepted with a workflow_run_id immediately, instead of holding an HTTP connection open. Clients poll results via the existing GET /v1/workflows/run/{id} endpoint.

Motivation: Currently Dify only supports blocking (wait for result) and streaming (hold SSE connection). Both require holding an HTTP connection open for the entire workflow duration. This causes HTTP timeouts on long workflows, and stuck "running" status when SSE connections drop (#26169). The async mode solves this with a standard REST 202+poll pattern.

Core changes

  • Accept "async" in WorkflowRunPayload.response_mode
  • Add generate_async() to AppGenerateService — pre-creates a WorkflowRun row with status=SCHEDULED, dispatches to the existing Celery task, returns the workflow_run_id without subscribing to Redis pub/sub events
  • Add WorkflowRunTriggeredFrom.API_ASYNC enum value for audit trail
  • Preserve triggered_from during repository session.merge() so async runs keep their api-async label

Production safeguards

  • Idempotency-Key header support (Redis-backed, 24h TTL) — prevents duplicate workflow execution on client retry
  • Per-tenant concurrency guard (SQL COUNT query, default limit 50) — returns 429 Too Many Requests when exceeded
  • Zombie reaper Celery Beat task — force-fails WorkflowRun rows stuck in SCHEDULED (>5min) or RUNNING (>30min). This also fixes #26169 for existing streaming/blocking modes.

Files changed

FileChange
api/models/enums.pyAdd API_ASYNC enum value
api/core/repositories/sqlalchemy_workflow_execution_repository.pyPreserve triggered_from during merge (1 line)
api/configs/feature/__init__.pyAdd config values for concurrency limit + reaper
api/services/app_generate_service.pyAdd generate_async() + helper methods
api/controllers/service_api/app/workflow.pyAdd async branch in controller
api/schedule/reap_zombie_workflow_runs_task.pyNew — zombie reaper Celery Beat task
api/extensions/ext_celery.pyRegister reaper in beat schedule
api/.env.example, docker/.env.example, docker/docker-compose.yamlAdd env var entries

Closes #24263 Fixes #26169

Screenshots

N/A — backend-only change, no UI modifications.

Checklist

  • This change requires a documentation update, included: Dify Document
  • I understand that this PR may be closed in case there was no previous discussion or issues. (This doesn't apply to typos!)
  • I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
  • I've updated the documentation accordingly.
  • I ran make lint and make type-check (backend) and cd web && pnpm exec vp staged (frontend) to appease the lint gods

Changed files

  • api/.env.example (modified, +5/-0)
  • api/configs/feature/__init__.py (modified, +16/-0)
  • api/controllers/service_api/app/workflow.py (modified, +55/-4)
  • api/core/repositories/sqlalchemy_workflow_execution_repository.py (modified, +1/-0)
  • api/extensions/ext_celery.py (modified, +7/-0)
  • api/models/enums.py (modified, +1/-0)
  • api/schedule/reap_zombie_workflow_runs_task.py (added, +46/-0)
  • api/services/app_generate_service.py (modified, +222/-1)
  • api/tasks/app_generate/workflow_execute_task.py (modified, +36/-2)
  • api/tests/unit_tests/core/repositories/test_sqlalchemy_workflow_execution_repository.py (modified, +27/-0)
  • api/tests/unit_tests/schedule/test_reap_zombie_workflow_runs_task.py (added, +102/-0)
  • api/tests/unit_tests/services/test_app_generate_service_async.py (added, +288/-0)
  • api/tests/unit_tests/tasks/test_workflow_execute_task_error_handling.py (added, +223/-0)
  • docker/.env.example (modified, +5/-0)
  • docker/docker-compose.yaml (modified, +4/-0)
RAW_BUFFERClick to expand / collapse

Is this request related to a challenge you're experiencing? Tell me about your story.

When using Dify's workflow API for long-running tasks (10-30s), the current blocking and streaming response modes both hold HTTP connections open. This causes:

  1. SSE connection management overhead — our middleware must maintain long-lived connections, handle reconnections, and buffer streamed events for workflows where we only need the final result.
  2. Stuck "running" workflows — when SSE connections drop mid-execution (network timeouts, load balancer limits), the workflow run remains in "running" status indefinitely with no mechanism to recover. Related: #26169.

A fire-and-forget async mode would let us submit workflows and poll for results, avoiding both problems.

Additional context or comments

Proposed solution: Add response_mode: "async" to POST /v1/workflows/run.

  • Returns 202 Accepted immediately with a workflow_run_id
  • Poll status via existing GET /v1/workflows/run/{workflow_run_id}
  • Idempotency key support via Idempotency-Key header (Redis, 24h TTL)
  • Per-tenant concurrency limit (ASYNC_WORKFLOW_MAX_CONCURRENT, default 50)
  • Zombie reaper Celery Beat task to force-fail stuck SCHEDULED (>5min) and RUNNING (>30min) rows

Implementation PR: #35301 Docs PR: langgenius/dify-docs#746

extent analysis

TL;DR

To address the issue of held HTTP connections and stuck "running" workflows, consider implementing an async mode for the workflow API by adding a response_mode: "async" parameter to the POST /v1/workflows/run endpoint.

Guidance

  • Implement the proposed async response mode to allow for fire-and-forget workflow submissions and polling for results.
  • Use the Idempotency-Key header to ensure idempotency and prevent duplicate workflow runs.
  • Configure the per-tenant concurrency limit (ASYNC_WORKFLOW_MAX_CONCURRENT) to prevent overwhelming the system.
  • Utilize the zombie reaper Celery Beat task to force-fail stuck workflow runs and prevent indefinite "running" statuses.

Example

No code snippet is provided as the issue does not contain specific code examples, but the proposed solution involves adding a response_mode: "async" parameter to the POST /v1/workflows/run endpoint.

Notes

The implementation of the async mode requires careful consideration of concurrency limits, idempotency, and error handling to ensure a reliable and efficient workflow execution process.

Recommendation

Apply the proposed workaround by implementing the async response mode, as it addresses the root causes of the issue and provides a more efficient and reliable workflow execution process. This approach allows for fire-and-forget workflow submissions and polling for results, reducing the overhead of held HTTP connections and stuck "running" workflows.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING