dify - 💡(How to fix) Fix Support durable async execution backends for long-running workflow steps

Code Example

I am exploring heavier AI workflow and agentic use cases where a workflow step may take longer than a normal request/API timeout window.

Example scenarios:

- running a long batch-processing task
- executing an MCP tool that takes several minutes
- running an evaluation job
- running inference workloads that need logs and artifacts
- triggering a fine-tuning or compute-heavy job
- generating files/artifacts that need to be retrieved after completion

In these cases, the workflow should not have to keep the original HTTP/request path open until the job finishes.

The challenge is that long-running jobs need durable execution behavior:

- submit the job asynchronously
- return a `job_id` quickly
- track job status over time
- fetch or stream logs
- retrieve artifacts/results when the job completes
- handle failed/cancelled states cleanly

This would make Dify more useful for workflows where an AI app needs to trigger real compute work, not just call short-lived APIs.

---

One possible design would be to support a pluggable external execution backend for long-running workflow steps.

A backend interface could support operations like:

- `estimateJob(payload)` — optional, if the backend supports cost/runtime estimation
- `submitJob(payload)` — submit asynchronously and return a `job_id`
- `getJobStatus(job_id)` — check queued/running/succeeded/failed/cancelled state
- `getJobLogs(job_id)` — fetch or stream logs
- `cancelJob(job_id)` — cancel a running job
- `listArtifacts(job_id)` — list generated artifacts
- `getArtifactDownloadUrl(job_id, artifact_id)` — retrieve output files/results

This would let Dify remain the workflow/app orchestration layer, while heavier execution steps can be handled by an external execution system.

Example backend:

[Jungle Grid](https://junglegrid.dev) is an execution layer for AI agents and developer workloads. It lets agents/workflows submit compute jobs, monitor lifecycle state/logs, and retrieve artifacts asynchronously.

The request is not for Dify to depend on Jungle Grid specifically. The broader request is for Dify to support durable async execution backends for long-running workflow steps, where Jungle Grid could be one possible provider.

Self Checks

I have read the Contributing Guide and Language Policy.
I have searched for existing issues search for existing issues, including closed ones.
I confirm that I am using English to submit this report, otherwise it will be closed.
Please do not modify this template :) and fill in all the required fields.

1. Is this request related to a challenge you're experiencing? Tell me about your story.

I am exploring heavier AI workflow and agentic use cases where a workflow step may take longer than a normal request/API timeout window.

Example scenarios:

- running a long batch-processing task
- executing an MCP tool that takes several minutes
- running an evaluation job
- running inference workloads that need logs and artifacts
- triggering a fine-tuning or compute-heavy job
- generating files/artifacts that need to be retrieved after completion

In these cases, the workflow should not have to keep the original HTTP/request path open until the job finishes.

The challenge is that long-running jobs need durable execution behavior:

- submit the job asynchronously
- return a `job_id` quickly
- track job status over time
- fetch or stream logs
- retrieve artifacts/results when the job completes
- handle failed/cancelled states cleanly

This would make Dify more useful for workflows where an AI app needs to trigger real compute work, not just call short-lived APIs.

2. Additional context or comments

One possible design would be to support a pluggable external execution backend for long-running workflow steps.

A backend interface could support operations like:

- `estimateJob(payload)` — optional, if the backend supports cost/runtime estimation
- `submitJob(payload)` — submit asynchronously and return a `job_id`
- `getJobStatus(job_id)` — check queued/running/succeeded/failed/cancelled state
- `getJobLogs(job_id)` — fetch or stream logs
- `cancelJob(job_id)` — cancel a running job
- `listArtifacts(job_id)` — list generated artifacts
- `getArtifactDownloadUrl(job_id, artifact_id)` — retrieve output files/results

This would let Dify remain the workflow/app orchestration layer, while heavier execution steps can be handled by an external execution system.

Example backend:

[Jungle Grid](https://junglegrid.dev) is an execution layer for AI agents and developer workloads. It lets agents/workflows submit compute jobs, monitor lifecycle state/logs, and retrieve artifacts asynchronously.

The request is not for Dify to depend on Jungle Grid specifically. The broader request is for Dify to support durable async execution backends for long-running workflow steps, where Jungle Grid could be one possible provider.

3. Can you help us with this feature?

I am interested in contributing to this feature.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

dify - 💡(How to fix) Fix Support durable async execution backends for long-running workflow steps

Recommended Tools

GitHub issue graph ai analysis

Code Example

Self Checks

1. Is this request related to a challenge you're experiencing? Tell me about your story.

2. Additional context or comments

3. Can you help us with this feature?

Still need to ship something?

TRENDING