n8n - ✅(Solved) Fix [Critical Bug] Parent Workflow Stuck in "Waiting" State After Sub-Workflows Complete with multiple items (Regression in v2.16.0) [4 pull requests, 3 comments, 2 participants]

n8n2026-04-08 17:03:00

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

n8n-io/n8n#28208•Fetched 2026-04-09 08:16:09

View on GitHub

Comments

Participants

Timeline

Reactions

Author

ARHAEEM

Participants

ARHAEEM

n8n-assistant[bot]

Timeline (top)

commented ×3labeled ×1mentioned ×1subscribed ×1

Error Message

core

n8nVersion: 2.16.0
platform: docker (self-hosted)
nodeJsVersion: 24.14.1
nodeEnv: production
database: postgres
executionMode: regular
concurrency: 16

storage

success: all
error: all
progress: false
manual: true
binaryMode: filesystem

pruning

enabled: true
maxAge: 720 hours
maxCount: 15000 executions

client

userAgent: mozilla/5.0 (windows nt 10.0; win64; x64) applewebkit/537.36 (khtml, like gecko) chrome/146.0.0.0 safari/537.36 edg/146.0.0.0
isTouchDevice: false

security

secureCookie: true

Generated at: 2026-04-08T16:17:25.547Z

Root Cause

#10444 — "Workflows executed by workflows display as Queued" (closed, fixed in v1.59.0 via #10764) — same symptom family but different root cause
#25531 — "Sub-workflow in queue mode showing as failed execution" (closed) — status propagation failure
#13135 — "Calling a sub-workflow with a wait by webhook node" (closed) — related sub-workflow completion issues
#14203 — "Wait condition in Execute workflow causes wrong data output" (open) — related execution engine handling

Fix Action

Fix / Workaround

1. `perf(core): Make Wait node fully durable by removing in-memory execution path` (#27066) — Released in v2.16.0

2. `fix(core): Fix execution history when flow includes wait node` (#27357) — Released in v2.15.0

This is a single-process n8n instance (Task Broker on port 5679, no BullMQ workers)
The bug is in workflow-execute coordination, not queue infrastructure
Server logs show a 37-minute gap with zero log output while n8n process was alive but the execution engine was silently stuck
Sub-workflow API calls completed successfully (visible in logs), but the parent execution never received the completion signal
Workaround: Downgrade to v2.14.2 where the issue does not occur

PR fix notes

PR #27066: perf(core): Make Wait node fully durable by removing in-memory execution path

Repository: n8n-io/n8n
Author: shortstacked
State: closed | merged: True
Link: https://github.com/n8n-io/n8n/pull/27066

Description (problem / solution / changelog)

Summary

Removes the dual-execution-path behaviour from the Wait node. Previously, waits shorter than 65 seconds ran entirely in-memory via setTimeout and were never persisted to the database. This made them invisible to crash recovery, multi-main failover, and the WaitTracker entirely.

What changed:

Wait node (Wait.node.ts): removed the < 65s in-memory branch. All time-based waits now call putToWait immediately, regardless of duration.
ExecutionRepository (execution.repository.ts): getWaitingExecutions() now uses a DB-server-clock-anchored 15-second lookahead window (NOW() + INTERVAL '15 seconds' / datetime('now', '+15 seconds')) via createQueryBuilder. Added getServerTime() to fetch the DB server's current timestamp (PostgreSQL: CURRENT_TIMESTAMP(3), SQLite: STRFTIME).
WaitTracker (wait-tracker.ts): poll interval reduced from 60s → 5s. triggerTime is now computed relative to the DB server clock (via a 60s-TTL cache with elapsed-time interpolation) rather than Date.now(), eliminating inter-instance clock skew from timer precision. Logs a warning when skew exceeds 2s.
PrometheusMetricsService (prometheus-metrics.service.ts): added n8n_db_clock_skew_ms gauge, scraped live on each Prometheus pull.

Why: The 65s threshold existed because the old 60s poll interval made DB-persisted short waits resume late. Reducing the poll to 5s and adding a 15s lookahead window eliminates the need for the in-memory path entirely. The trade-off is up to ~5s of jitter on short waits in exchange for full crash/restart durability.

Blast radius: Narrow — only affects time-based Wait node resume and WaitTracker scheduling. No schema changes, no API changes. Safe to revert with a single commit revert; in-flight waiting executions survive revert (they resume ~60s late via the old poll cycle).

How to test:

Create a workflow: Manual Trigger → Wait (15s) → Set node
Execute — verify execution enters waiting status in DB immediately (not after 15s)
Verify it resumes at ~15s (±5s acceptable)
Kill n8n mid-wait, restart — verify it resumes after restart
Scrape /metrics and confirm n8n_db_clock_skew_ms gauge is present

Related Linear tickets, Github issues, and Community forum posts

Review / Merge checklist

PR title and summary are descriptive. (conventions)
Docs updated or follow-up ticket created.
Tests included.
PR Labeled with release/backport (if the PR is an urgent fix that needs to be backported)

Changed files

packages/@n8n/db/src/repositories/__tests__/clock.repository.test.ts (added, +54/-0)
packages/@n8n/db/src/repositories/__tests__/execution.repository.test.ts (modified, +58/-1)
packages/@n8n/db/src/repositories/clock.repository.ts (added, +32/-0)
packages/@n8n/db/src/repositories/execution.repository.ts (modified, +13/-20)
packages/@n8n/db/src/repositories/index.ts (modified, +1/-0)
packages/cli/src/__tests__/db-clock.service.test.ts (added, +77/-0)
packages/cli/src/__tests__/wait-tracker.test.ts (modified, +233/-4)
packages/cli/src/databases/repositories/__tests__/execution.repository.test.ts (modified, +23/-13)
packages/cli/src/services/db-clock.service.ts (added, +42/-0)
packages/cli/src/wait-tracker.ts (modified, +116/-82)
packages/cli/test/integration/database/repositories/execution.repository.test.ts (modified, +126/-0)
packages/nodes-base/nodes/Wait/Wait.node.ts (modified, +0/-15)
packages/nodes-base/nodes/Wait/test/Wait.node.test.ts (modified, +31/-52)
packages/nodes-base/nodes/Wait/test/Wait.workflow.json (removed, +0/-162)

PR #27357: fix(core): Fix execution history when flow includes wait node

Repository: n8n-io/n8n
Author: DawidMyslak
State: closed | merged: True
Link: https://github.com/n8n-io/n8n/pull/27357

Description (problem / solution / changelog)

Summary

Re-applies the fix from #23146 (which was reverted in #25610 due to PostgreSQL timezone issues) with a corrected approach.

Original bug: When a workflow with a Wait node resumes, setRunning() overwrites the original startedAt timestamp with the current time, causing wrong execution duration and sort order in execution history.

Why the original fix broke PostgreSQL: PR #23146 used raw SQL COALESCE(startedAt, :startedAt) with DateUtils.mixedDateToUtcDatetimeString(), which produces timezone-ambiguous date strings. On PostgreSQL with timestamptz columns, these strings were interpreted using the session timezone instead of UTC, causing 1-hour offsets (or even negative durations) for non-UTC users.

This fix: Uses a transaction with standard TypeORM findOneBy + update methods instead of raw SQL. TypeORM handles date serialization correctly per database driver (SQLite and PostgreSQL), eliminating the timezone mismatch entirely.

How to test

Create a workflow: Webhook → Wait (resume by webhook) → Edit Fields
Trigger the webhook to start execution
Execution enters "waiting" state — note the startedAt time
Resume the execution via the wait webhook
Verify startedAt remains unchanged (not overwritten with resume time)

Sample workflow is also provided in the Linear ticket.

Related Linear tickets, Github issues, and Community forum posts

Review / Merge checklist

PR title and summary are descriptive. (conventions)
Docs updated or follow-up ticket created.
Tests included.
PR Labeled with release/backport (if the PR is an urgent fix that needs to be backported)

Changed files

packages/@n8n/db/src/repositories/__tests__/execution.repository.test.ts (modified, +44/-0)
packages/@n8n/db/src/repositories/execution.repository.ts (modified, +13/-2)
packages/cli/src/__tests__/wait-tracker.test.ts (modified, +23/-0)
packages/cli/src/__tests__/workflow-runner.test.ts (modified, +8/-1)
packages/cli/test/integration/execution.repository.test.ts (modified, +32/-0)
packages/testing/playwright/tests/e2e/workflows/executions/list.spec.ts (modified, +51/-0)
packages/testing/playwright/workflows/cat-1854-wait-execution-history.json (added, +110/-0)

Code Example

## core
- n8nVersion: 2.16.0
- platform: docker (self-hosted)
- nodeJsVersion: 24.14.1
- nodeEnv: production
- database: postgres
- executionMode: regular
- concurrency: 16

## storage
- success: all
- error: all
- progress: false
- manual: true
- binaryMode: filesystem

## pruning
- enabled: true
- maxAge: 720 hours
- maxCount: 15000 executions

## client
- userAgent: mozilla/5.0 (windows nt 10.0; win64; x64) applewebkit/537.36 (khtml, like gecko) chrome/146.0.0.0 safari/537.36 edg/146.0.0.0
- isTouchDevice: false

## security
- secureCookie: true

Generated at: 2026-04-08T16:17:25.547Z

RAW_BUFFERClick to expand / collapse

Bug Description

When using the Execute Workflow node with multiple items in input with setting in "Run once for each item" mode with "Wait for Sub-Workflow Completion" enabled, the parent workflow gets permanently stuck in "Waiting" state with the description:

"The workflow is waiting indefinitely for an incoming webhook call."

All sub-workflow executions complete successfully (status: Success), but the parent workflow never receives the completion signal and remains stuck indefinitely. The next node in the parent workflow never executes causing the whole flow stuck there.

This is a regression - the same workflow runs correctly on v2.14.2 but breaks on v2.16.0.

To Reproduce

Create a sub-workflow with an Execute Sub-workflow Trigger → any processing nodes (e.g., HTTP requests, Set nodes).
Create a parent workflow with:
- A node that outputs multiple items (e.g., 5 items)
- An Execute Workflow node configured as:
  - Mode: Run once for each item
  - Wait for Sub-Workflow Completion: Enabled (toggled ON)
- A node after the Execute Workflow node (e.g., Merge node)
Execute the parent workflow.
Observe: All 5 sub-workflow executions complete with Success status, but the parent workflow remains in Waiting state indefinitely.

Expected behavior

The parent workflow should detect that all sub-workflow executions have completed and proceed to the next node ("Merge marketplaces parallel" in my case).

Actual Behavior

Sub-workflows: All complete with Success status (visible in execution list)
Parent workflow: Stuck in Waiting state with message "The workflow is waiting indefinitely for an incoming webhook call"
The "Stop" button is shown next to the waiting parent executions
There is a 37+ minute gap in server logs (no log output) while the execution engine is silently stuck
No crashes, no OOM, no BullMQ errors — pure status propagation failure

Screenshots

Screenshot 1 — Execute Workflow node with 5 items being passed to sub-workflow: <img width="737" height="313" alt=" Execute Workflow node with 5 items being passed to sub-workflow" src="https://github.com/user-attachments/assets/8e075703-e24b-44f5-aeaf-3a4855d8fc62" />

Screenshot 2 — Execution list showing parent workflows stuck in "Waiting" while sub-workflows show "Success":

Screenshot 3 — Execute Workflow node settings: Mode = "Run once for each item", Wait for Sub-Workflow Completion = ON:

Suspected Regression Cause

This regression was introduced between v2.14.2 (working) and v2.16.0 (broken). Two PRs in this window are likely candidates:

1. `perf(core): Make Wait node fully durable by removing in-memory execution path` (#27066) — Released in v2.16.0

This PR fundamentally changed how the WaitTracker and getWaitingExecutions() work:

getWaitingExecutions() WHERE clause changed: status != 'crashed' → status = 'waiting' (now only picks up executions with exactly status = 'waiting')
Time window narrowed: 70-second lookahead → 15-second DB-clock lookahead
Poll interval changed: 60s → 5s
In-memory execution path removed: All waits now go through DB persistence

Hypothesis: When the Execute Workflow node runs in "Run once for each item" mode with "Wait for Sub-Workflow Completion" enabled, the parent execution enters a waiting state internally. The new strict status = 'waiting' filter in getWaitingExecutions() may interact incorrectly with how the parent execution's status is set during sub-workflow coordination in workflow-execute-additional-data.ts. The old status != 'crashed' filter was more permissive and would pick up these executions regardless of their exact status.

2. `fix(core): Fix execution history when flow includes wait node` (#27357) — Released in v2.15.0

This re-applied a previously reverted fix (#23146, reverted in #25610) for execution timestamp handling when Wait nodes resume. While this PR focuses on startedAt preservation, changes to the execution status/resume flow could have side effects on sub-workflow completion signaling.

Related Issues

#10444 — "Workflows executed by workflows display as Queued" (closed, fixed in v1.59.0 via #10764) — same symptom family but different root cause
#25531 — "Sub-workflow in queue mode showing as failed execution" (closed) — status propagation failure
#13135 — "Calling a sub-workflow with a wait by webhook node" (closed) — related sub-workflow completion issues
#14203 — "Wait condition in Execute workflow causes wrong data output" (open) — related execution engine handling

Debug Info

## core
- n8nVersion: 2.16.0
- platform: docker (self-hosted)
- nodeJsVersion: 24.14.1
- nodeEnv: production
- database: postgres
- executionMode: regular
- concurrency: 16

## storage
- success: all
- error: all
- progress: false
- manual: true
- binaryMode: filesystem

## pruning
- enabled: true
- maxAge: 720 hours
- maxCount: 15000 executions

## client
- userAgent: mozilla/5.0 (windows nt 10.0; win64; x64) applewebkit/537.36 (khtml, like gecko) chrome/146.0.0.0 safari/537.36 edg/146.0.0.0
- isTouchDevice: false

## security
- secureCookie: true

Generated at: 2026-04-08T16:17:25.547Z

Additional Context

This is a single-process n8n instance (Task Broker on port 5679, no BullMQ workers)
The bug is in workflow-execute coordination, not queue infrastructure
Server logs show a 37-minute gap with zero log output while n8n process was alive but the execution engine was silently stuck
Sub-workflow API calls completed successfully (visible in logs), but the parent execution never received the completion signal
Workaround: Downgrade to v2.14.2 where the issue does not occur

Operating System

docker (self-hosted)

n8n Version

2.16.0

Node.js Version

24.14.1

Database

PostgreSQL

Execution mode

main (default)

Hosting

self hosted

extent analysis

TL;DR

The most likely fix for the issue is to downgrade to n8n version 2.14.2, where the workflow execution coordination worked correctly.

Guidance

Verify the issue: Confirm that the problem occurs when using the "Execute Workflow" node with "Run once for each item" mode and "Wait for Sub-Workflow Completion" enabled in n8n version 2.16.0.
Check execution logs: Investigate the server logs for any errors or gaps in log output, similar to the 37-minute gap mentioned in the issue.
Test with previous version: Downgrade to n8n version 2.14.2 and verify that the workflow execution works as expected.
Monitor for similar issues: Keep an eye on related issues, such as #14203, which may be connected to the same root cause.

Example

No code snippet is provided, as the issue seems to be related to the n8n workflow engine and its configuration.

Notes

The issue appears to be a regression introduced between versions 2.14.2 and 2.16.0. The exact cause is uncertain, but it may be related to changes in the Wait node or execution history handling.

Recommendation

Apply the workaround by downgrading to version 2.14.2, as this version is known to work correctly. This will allow you to continue using the "Execute Workflow" node with "Run once for each item" mode and "Wait for Sub-Workflow Completion" enabled without encountering the issue.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

The parent workflow should detect that all sub-workflow executions have completed and proceed to the next node ("Merge marketplaces parallel" in my case).

#api #GPU compatibility #latency issue #model loading #dependency error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

n8n - ✅(Solved) Fix [Critical Bug] Parent Workflow Stuck in "Waiting" State After Sub-Workflows Complete with multiple items (Regression in v2.16.0) [4 pull requests, 3 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

core

storage

pruning

client

security

Root Cause

Fix Action

Fix / Workaround

1. perf(core): Make Wait node fully durable by removing in-memory execution path (#27066) — Released in v2.16.0

2. fix(core): Fix execution history when flow includes wait node (#27357) — Released in v2.15.0

PR fix notes

PR #27066: perf(core): Make Wait node fully durable by removing in-memory execution path

Description (problem / solution / changelog)

Summary

Related Linear tickets, Github issues, and Community forum posts

Review / Merge checklist

Changed files

PR #27357: fix(core): Fix execution history when flow includes wait node

Description (problem / solution / changelog)

Summary

How to test

Related Linear tickets, Github issues, and Community forum posts

Review / Merge checklist

Changed files

Code Example

Bug Description

To Reproduce

Expected behavior

Actual Behavior

Screenshots

Suspected Regression Cause

1. perf(core): Make Wait node fully durable by removing in-memory execution path (#27066) — Released in v2.16.0

2. fix(core): Fix execution history when flow includes wait node (#27357) — Released in v2.15.0

Related Issues

Debug Info

Additional Context

Operating System

n8n Version

Node.js Version

Database

Execution mode

Hosting

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING

1. `perf(core): Make Wait node fully durable by removing in-memory execution path` (#27066) — Released in v2.16.0

2. `fix(core): Fix execution history when flow includes wait node` (#27357) — Released in v2.15.0

1. `perf(core): Make Wait node fully durable by removing in-memory execution path` (#27066) — Released in v2.16.0

2. `fix(core): Fix execution history when flow includes wait node` (#27357) — Released in v2.15.0