dify - 💡(How to fix) Fix Support durable async execution backends for long-running workflow steps

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Code Example

I am exploring heavier AI workflow and agentic use cases where a workflow step may take longer than a normal request/API timeout window.

Example scenarios:

- running a long batch-processing task
- executing an MCP tool that takes several minutes
- running an evaluation job
- running inference workloads that need logs and artifacts
- triggering a fine-tuning or compute-heavy job
- generating files/artifacts that need to be retrieved after completion

In these cases, the workflow should not have to keep the original HTTP/request path open until the job finishes.

The challenge is that long-running jobs need durable execution behavior:

- submit the job asynchronously
- return a `job_id` quickly
- track job status over time
- fetch or stream logs
- retrieve artifacts/results when the job completes
- handle failed/cancelled states cleanly

This would make Dify more useful for workflows where an AI app needs to trigger real compute work, not just call short-lived APIs.

---

One possible design would be to support a pluggable external execution backend for long-running workflow steps.

A backend interface could support operations like:

- `estimateJob(payload)` — optional, if the backend supports cost/runtime estimation
- `submitJob(payload)` — submit asynchronously and return a `job_id`
- `getJobStatus(job_id)` — check queued/running/succeeded/failed/cancelled state
- `getJobLogs(job_id)` — fetch or stream logs
- `cancelJob(job_id)` — cancel a running job
- `listArtifacts(job_id)` — list generated artifacts
- `getArtifactDownloadUrl(job_id, artifact_id)` — retrieve output files/results

This would let Dify remain the workflow/app orchestration layer, while heavier execution steps can be handled by an external execution system.

Example backend:

[Jungle Grid](https://junglegrid.dev) is an execution layer for AI agents and developer workloads. It lets agents/workflows submit compute jobs, monitor lifecycle state/logs, and retrieve artifacts asynchronously.

The request is not for Dify to depend on Jungle Grid specifically. The broader request is for Dify to support durable async execution backends for long-running workflow steps, where Jungle Grid could be one possible provider.
RAW_BUFFERClick to expand / collapse

Self Checks

  • I have read the Contributing Guide and Language Policy.
  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report, otherwise it will be closed.
  • Please do not modify this template :) and fill in all the required fields.

1. Is this request related to a challenge you're experiencing? Tell me about your story.

I am exploring heavier AI workflow and agentic use cases where a workflow step may take longer than a normal request/API timeout window.

Example scenarios:

- running a long batch-processing task
- executing an MCP tool that takes several minutes
- running an evaluation job
- running inference workloads that need logs and artifacts
- triggering a fine-tuning or compute-heavy job
- generating files/artifacts that need to be retrieved after completion

In these cases, the workflow should not have to keep the original HTTP/request path open until the job finishes.

The challenge is that long-running jobs need durable execution behavior:

- submit the job asynchronously
- return a `job_id` quickly
- track job status over time
- fetch or stream logs
- retrieve artifacts/results when the job completes
- handle failed/cancelled states cleanly

This would make Dify more useful for workflows where an AI app needs to trigger real compute work, not just call short-lived APIs.

2. Additional context or comments

One possible design would be to support a pluggable external execution backend for long-running workflow steps.

A backend interface could support operations like:

- `estimateJob(payload)` — optional, if the backend supports cost/runtime estimation
- `submitJob(payload)` — submit asynchronously and return a `job_id`
- `getJobStatus(job_id)` — check queued/running/succeeded/failed/cancelled state
- `getJobLogs(job_id)` — fetch or stream logs
- `cancelJob(job_id)` — cancel a running job
- `listArtifacts(job_id)` — list generated artifacts
- `getArtifactDownloadUrl(job_id, artifact_id)` — retrieve output files/results

This would let Dify remain the workflow/app orchestration layer, while heavier execution steps can be handled by an external execution system.

Example backend:

[Jungle Grid](https://junglegrid.dev) is an execution layer for AI agents and developer workloads. It lets agents/workflows submit compute jobs, monitor lifecycle state/logs, and retrieve artifacts asynchronously.

The request is not for Dify to depend on Jungle Grid specifically. The broader request is for Dify to support durable async execution backends for long-running workflow steps, where Jungle Grid could be one possible provider.

3. Can you help us with this feature?

  • I am interested in contributing to this feature.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING