Fix Action

Fixed

Fixed by PR: feat(api): add async response_mode for workflow execution (https://github.com/langgenius/dify/pull/35301)

PR fix notes

PR #35301: feat(api): add async response_mode for workflow execution

Repository: langgenius/dify
Author: Umair706
State: open | merged: False
Link: https://github.com/langgenius/dify/pull/35301

Description (problem / solution / changelog)

[!IMPORTANT]

Make sure you have read our contribution guidelines

Ensure there is an associated issue and you have been assigned to it

Use the correct syntax to link this PR: Fixes #<issue number>.

Fixes #35327

Summary

Add response_mode: "async" to POST /v1/workflows/run — a true fire-and-forget mode that returns 202 Accepted with a workflow_run_id immediately, instead of holding an HTTP connection open. Clients poll results via the existing GET /v1/workflows/run/{id} endpoint.

Motivation: Currently Dify only supports blocking (wait for result) and streaming (hold SSE connection). Both require holding an HTTP connection open for the entire workflow duration. This causes HTTP timeouts on long workflows, and stuck "running" status when SSE connections drop (#26169). The async mode solves this with a standard REST 202+poll pattern.

Core changes

Accept "async" in WorkflowRunPayload.response_mode
Add generate_async() to AppGenerateService — pre-creates a WorkflowRun row with status=SCHEDULED, dispatches to the existing Celery task, returns the workflow_run_id without subscribing to Redis pub/sub events
Add WorkflowRunTriggeredFrom.API_ASYNC enum value for audit trail
Preserve triggered_from during repository session.merge() so async runs keep their api-async label

Production safeguards

Idempotency-Key header support (Redis-backed, 24h TTL) — prevents duplicate workflow execution on client retry
Per-tenant concurrency guard (SQL COUNT query, default limit 50) — returns 429 Too Many Requests when exceeded
Zombie reaper Celery Beat task — force-fails WorkflowRun rows stuck in SCHEDULED (>5min) or RUNNING (>30min). This also fixes #26169 for existing streaming/blocking modes.

Files changed

File	Change
`api/models/enums.py`	Add `API_ASYNC` enum value
`api/core/repositories/sqlalchemy_workflow_execution_repository.py`	Preserve `triggered_from` during merge (1 line)
`api/configs/feature/__init__.py`	Add config values for concurrency limit + reaper
`api/services/app_generate_service.py`	Add `generate_async()` + helper methods
`api/controllers/service_api/app/workflow.py`	Add async branch in controller
`api/schedule/reap_zombie_workflow_runs_task.py`	New — zombie reaper Celery Beat task
`api/extensions/ext_celery.py`	Register reaper in beat schedule
`api/.env.example`, `docker/.env.example`, `docker/docker-compose.yaml`	Add env var entries

Closes #24263 Fixes #26169

Screenshots

N/A — backend-only change, no UI modifications.

Checklist

This change requires a documentation update, included: Dify Document
I understand that this PR may be closed in case there was no previous discussion or issues. (This doesn't apply to typos!)
I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
I've updated the documentation accordingly.
I ran make lint and make type-check (backend) and cd web && pnpm exec vp staged (frontend) to appease the lint gods

Changed files

api/.env.example (modified, +5/-0)
api/configs/feature/__init__.py (modified, +16/-0)
api/controllers/service_api/app/workflow.py (modified, +55/-4)
api/core/repositories/sqlalchemy_workflow_execution_repository.py (modified, +1/-0)
api/extensions/ext_celery.py (modified, +7/-0)
api/models/enums.py (modified, +1/-0)
api/schedule/reap_zombie_workflow_runs_task.py (added, +46/-0)
api/services/app_generate_service.py (modified, +222/-1)
api/tasks/app_generate/workflow_execute_task.py (modified, +36/-2)
api/tests/unit_tests/core/repositories/test_sqlalchemy_workflow_execution_repository.py (modified, +27/-0)
api/tests/unit_tests/schedule/test_reap_zombie_workflow_runs_task.py (added, +102/-0)
api/tests/unit_tests/services/test_app_generate_service_async.py (added, +288/-0)
api/tests/unit_tests/tasks/test_workflow_execute_task_error_handling.py (added, +223/-0)
docker/.env.example (modified, +5/-0)
docker/docker-compose.yaml (modified, +4/-0)

Is this request related to a challenge you're experiencing? Tell me about your story.

When using Dify's workflow API for long-running tasks (10-30s), the current blocking and streaming response modes both hold HTTP connections open. This causes:

SSE connection management overhead — our middleware must maintain long-lived connections, handle reconnections, and buffer streamed events for workflows where we only need the final result.
Stuck "running" workflows — when SSE connections drop mid-execution (network timeouts, load balancer limits), the workflow run remains in "running" status indefinitely with no mechanism to recover. Related: #26169.

A fire-and-forget async mode would let us submit workflows and poll for results, avoiding both problems.

Additional context or comments

Proposed solution: Add response_mode: "async" to POST /v1/workflows/run.

Returns 202 Accepted immediately with a workflow_run_id
Poll status via existing GET /v1/workflows/run/{workflow_run_id}
Idempotency key support via Idempotency-Key header (Redis, 24h TTL)
Per-tenant concurrency limit (ASYNC_WORKFLOW_MAX_CONCURRENT, default 50)
Zombie reaper Celery Beat task to force-fail stuck SCHEDULED (>5min) and RUNNING (>30min) rows

Implementation PR: #35301 Docs PR: langgenius/dify-docs#746

extent analysis

TL;DR

To address the issue of held HTTP connections and stuck "running" workflows, consider implementing an async mode for the workflow API by adding a response_mode: "async" parameter to the POST /v1/workflows/run endpoint.

Guidance

Implement the proposed async response mode to allow for fire-and-forget workflow submissions and polling for results.
Use the Idempotency-Key header to ensure idempotency and prevent duplicate workflow runs.
Configure the per-tenant concurrency limit (ASYNC_WORKFLOW_MAX_CONCURRENT) to prevent overwhelming the system.
Utilize the zombie reaper Celery Beat task to force-fail stuck workflow runs and prevent indefinite "running" statuses.

Example

No code snippet is provided as the issue does not contain specific code examples, but the proposed solution involves adding a response_mode: "async" parameter to the POST /v1/workflows/run endpoint.

Notes

The implementation of the async mode requires careful consideration of concurrency limits, idempotency, and error handling to ensure a reliable and efficient workflow execution process.

Recommendation

Apply the proposed workaround by implementing the async response mode, as it addresses the root causes of the issue and provides a more efficient and reliable workflow execution process. This approach allows for fire-and-forget workflow submissions and polling for results, reducing the overhead of held HTTP connections and stuck "running" workflows.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

dify - ✅(Solved) Fix feat(api): add async response_mode for workflow execution [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #35301: feat(api): add async response_mode for workflow execution

Description (problem / solution / changelog)

Summary

Core changes

Production safeguards

Files changed

Screenshots

Checklist

Changed files

Is this request related to a challenge you're experiencing? Tell me about your story.

Additional context or comments

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

dify - ✅(Solved) Fix feat(api): add async response_mode for workflow execution [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #35301: feat(api): add async response_mode for workflow execution

Description (problem / solution / changelog)

Summary

Core changes

Production safeguards

Files changed

Screenshots

Checklist

Changed files

Is this request related to a challenge you're experiencing? Tell me about your story.

Additional context or comments

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING