n8n - ✅(Solved) Fix [Critical Bug] Parent Workflow Stuck in "Waiting" State After Sub-Workflows Complete with multiple items (Regression in v2.16.0) [4 pull requests, 3 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
n8n-io/n8n#28208Fetched 2026-04-09 08:16:09
View on GitHub
Comments
3
Participants
2
Timeline
6
Reactions
0
Author
Timeline (top)
commented ×3labeled ×1mentioned ×1subscribed ×1

Error Message

core

  • n8nVersion: 2.16.0
  • platform: docker (self-hosted)
  • nodeJsVersion: 24.14.1
  • nodeEnv: production
  • database: postgres
  • executionMode: regular
  • concurrency: 16

storage

  • success: all
  • error: all
  • progress: false
  • manual: true
  • binaryMode: filesystem

pruning

  • enabled: true
  • maxAge: 720 hours
  • maxCount: 15000 executions

client

  • userAgent: mozilla/5.0 (windows nt 10.0; win64; x64) applewebkit/537.36 (khtml, like gecko) chrome/146.0.0.0 safari/537.36 edg/146.0.0.0
  • isTouchDevice: false

security

  • secureCookie: true

Generated at: 2026-04-08T16:17:25.547Z

Root Cause

  • #10444 — "Workflows executed by workflows display as Queued" (closed, fixed in v1.59.0 via #10764) — same symptom family but different root cause
  • #25531 — "Sub-workflow in queue mode showing as failed execution" (closed) — status propagation failure
  • #13135 — "Calling a sub-workflow with a wait by webhook node" (closed) — related sub-workflow completion issues
  • #14203 — "Wait condition in Execute workflow causes wrong data output" (open) — related execution engine handling

Fix Action

Fix / Workaround

1. perf(core): Make Wait node fully durable by removing in-memory execution path (#27066) — Released in v2.16.0

2. fix(core): Fix execution history when flow includes wait node (#27357) — Released in v2.15.0

  • This is a single-process n8n instance (Task Broker on port 5679, no BullMQ workers)
  • The bug is in workflow-execute coordination, not queue infrastructure
  • Server logs show a 37-minute gap with zero log output while n8n process was alive but the execution engine was silently stuck
  • Sub-workflow API calls completed successfully (visible in logs), but the parent execution never received the completion signal
  • Workaround: Downgrade to v2.14.2 where the issue does not occur

PR fix notes

PR #27066: perf(core): Make Wait node fully durable by removing in-memory execution path

Description (problem / solution / changelog)

Summary

Removes the dual-execution-path behaviour from the Wait node. Previously, waits shorter than 65 seconds ran entirely in-memory via setTimeout and were never persisted to the database. This made them invisible to crash recovery, multi-main failover, and the WaitTracker entirely.

What changed:

  • Wait node (Wait.node.ts): removed the < 65s in-memory branch. All time-based waits now call putToWait immediately, regardless of duration.
  • ExecutionRepository (execution.repository.ts): getWaitingExecutions() now uses a DB-server-clock-anchored 15-second lookahead window (NOW() + INTERVAL '15 seconds' / datetime('now', '+15 seconds')) via createQueryBuilder. Added getServerTime() to fetch the DB server's current timestamp (PostgreSQL: CURRENT_TIMESTAMP(3), SQLite: STRFTIME).
  • WaitTracker (wait-tracker.ts): poll interval reduced from 60s → 5s. triggerTime is now computed relative to the DB server clock (via a 60s-TTL cache with elapsed-time interpolation) rather than Date.now(), eliminating inter-instance clock skew from timer precision. Logs a warning when skew exceeds 2s.
  • PrometheusMetricsService (prometheus-metrics.service.ts): added n8n_db_clock_skew_ms gauge, scraped live on each Prometheus pull.

Why: The 65s threshold existed because the old 60s poll interval made DB-persisted short waits resume late. Reducing the poll to 5s and adding a 15s lookahead window eliminates the need for the in-memory path entirely. The trade-off is up to ~5s of jitter on short waits in exchange for full crash/restart durability.

Blast radius: Narrow — only affects time-based Wait node resume and WaitTracker scheduling. No schema changes, no API changes. Safe to revert with a single commit revert; in-flight waiting executions survive revert (they resume ~60s late via the old poll cycle).

How to test:

  1. Create a workflow: Manual Trigger → Wait (15s) → Set node
  2. Execute — verify execution enters waiting status in DB immediately (not after 15s)
  3. Verify it resumes at ~15s (±5s acceptable)
  4. Kill n8n mid-wait, restart — verify it resumes after restart
  5. Scrape /metrics and confirm n8n_db_clock_skew_ms gauge is present

Related Linear tickets, Github issues, and Community forum posts

<!-- Link to Linear ticket: https://linear.app/n8n/issue/[TICKET-ID] -->

Review / Merge checklist

  • PR title and summary are descriptive. (conventions)
  • Docs updated or follow-up ticket created.
  • Tests included.
  • PR Labeled with release/backport (if the PR is an urgent fix that needs to be backported)

Changed files

  • packages/@n8n/db/src/repositories/__tests__/clock.repository.test.ts (added, +54/-0)
  • packages/@n8n/db/src/repositories/__tests__/execution.repository.test.ts (modified, +58/-1)
  • packages/@n8n/db/src/repositories/clock.repository.ts (added, +32/-0)
  • packages/@n8n/db/src/repositories/execution.repository.ts (modified, +13/-20)
  • packages/@n8n/db/src/repositories/index.ts (modified, +1/-0)
  • packages/cli/src/__tests__/db-clock.service.test.ts (added, +77/-0)
  • packages/cli/src/__tests__/wait-tracker.test.ts (modified, +233/-4)
  • packages/cli/src/databases/repositories/__tests__/execution.repository.test.ts (modified, +23/-13)
  • packages/cli/src/services/db-clock.service.ts (added, +42/-0)
  • packages/cli/src/wait-tracker.ts (modified, +116/-82)
  • packages/cli/test/integration/database/repositories/execution.repository.test.ts (modified, +126/-0)
  • packages/nodes-base/nodes/Wait/Wait.node.ts (modified, +0/-15)
  • packages/nodes-base/nodes/Wait/test/Wait.node.test.ts (modified, +31/-52)
  • packages/nodes-base/nodes/Wait/test/Wait.workflow.json (removed, +0/-162)

PR #27357: fix(core): Fix execution history when flow includes wait node

Description (problem / solution / changelog)

Summary

Re-applies the fix from #23146 (which was reverted in #25610 due to PostgreSQL timezone issues) with a corrected approach.

Original bug: When a workflow with a Wait node resumes, setRunning() overwrites the original startedAt timestamp with the current time, causing wrong execution duration and sort order in execution history.

Why the original fix broke PostgreSQL: PR #23146 used raw SQL COALESCE(startedAt, :startedAt) with DateUtils.mixedDateToUtcDatetimeString(), which produces timezone-ambiguous date strings. On PostgreSQL with timestamptz columns, these strings were interpreted using the session timezone instead of UTC, causing 1-hour offsets (or even negative durations) for non-UTC users.

This fix: Uses a transaction with standard TypeORM findOneBy + update methods instead of raw SQL. TypeORM handles date serialization correctly per database driver (SQLite and PostgreSQL), eliminating the timezone mismatch entirely.

How to test

  1. Create a workflow: Webhook → Wait (resume by webhook) → Edit Fields
  2. Trigger the webhook to start execution
  3. Execution enters "waiting" state — note the startedAt time
  4. Resume the execution via the wait webhook
  5. Verify startedAt remains unchanged (not overwritten with resume time)

Sample workflow is also provided in the Linear ticket.

Related Linear tickets, Github issues, and Community forum posts

Review / Merge checklist

  • PR title and summary are descriptive. (conventions)
  • Docs updated or follow-up ticket created.
  • Tests included.
  • PR Labeled with release/backport (if the PR is an urgent fix that needs to be backported)

Changed files

  • packages/@n8n/db/src/repositories/__tests__/execution.repository.test.ts (modified, +44/-0)
  • packages/@n8n/db/src/repositories/execution.repository.ts (modified, +13/-2)
  • packages/cli/src/__tests__/wait-tracker.test.ts (modified, +23/-0)
  • packages/cli/src/__tests__/workflow-runner.test.ts (modified, +8/-1)
  • packages/cli/test/integration/execution.repository.test.ts (modified, +32/-0)
  • packages/testing/playwright/tests/e2e/workflows/executions/list.spec.ts (modified, +51/-0)
  • packages/testing/playwright/workflows/cat-1854-wait-execution-history.json (added, +110/-0)

Code Example

## core
- n8nVersion: 2.16.0
- platform: docker (self-hosted)
- nodeJsVersion: 24.14.1
- nodeEnv: production
- database: postgres
- executionMode: regular
- concurrency: 16

## storage
- success: all
- error: all
- progress: false
- manual: true
- binaryMode: filesystem

## pruning
- enabled: true
- maxAge: 720 hours
- maxCount: 15000 executions

## client
- userAgent: mozilla/5.0 (windows nt 10.0; win64; x64) applewebkit/537.36 (khtml, like gecko) chrome/146.0.0.0 safari/537.36 edg/146.0.0.0
- isTouchDevice: false

## security
- secureCookie: true

Generated at: 2026-04-08T16:17:25.547Z
RAW_BUFFERClick to expand / collapse

Bug Description

When using the Execute Workflow node with multiple items in input with setting in "Run once for each item" mode with "Wait for Sub-Workflow Completion" enabled, the parent workflow gets permanently stuck in "Waiting" state with the description:

"The workflow is waiting indefinitely for an incoming webhook call."

All sub-workflow executions complete successfully (status: Success), but the parent workflow never receives the completion signal and remains stuck indefinitely. The next node in the parent workflow never executes causing the whole flow stuck there.

This is a regression - the same workflow runs correctly on v2.14.2 but breaks on v2.16.0.

To Reproduce

  1. Create a sub-workflow with an Execute Sub-workflow Trigger → any processing nodes (e.g., HTTP requests, Set nodes).
  2. Create a parent workflow with:
    • A node that outputs multiple items (e.g., 5 items)
    • An Execute Workflow node configured as:
      • Mode: Run once for each item
      • Wait for Sub-Workflow Completion: Enabled (toggled ON)
    • A node after the Execute Workflow node (e.g., Merge node)
  3. Execute the parent workflow.
  4. Observe: All 5 sub-workflow executions complete with Success status, but the parent workflow remains in Waiting state indefinitely.

Expected behavior

The parent workflow should detect that all sub-workflow executions have completed and proceed to the next node ("Merge marketplaces parallel" in my case).

Actual Behavior

  • Sub-workflows: All complete with Success status (visible in execution list)
  • Parent workflow: Stuck in Waiting state with message "The workflow is waiting indefinitely for an incoming webhook call"
  • The "Stop" button is shown next to the waiting parent executions
  • There is a 37+ minute gap in server logs (no log output) while the execution engine is silently stuck
  • No crashes, no OOM, no BullMQ errors — pure status propagation failure

Screenshots

Screenshot 1 — Execute Workflow node with 5 items being passed to sub-workflow: <img width="737" height="313" alt=" Execute Workflow node with 5 items being passed to sub-workflow" src="https://github.com/user-attachments/assets/8e075703-e24b-44f5-aeaf-3a4855d8fc62" />

Screenshot 2 — Execution list showing parent workflows stuck in "Waiting" while sub-workflows show "Success":

<img width="1712" height="339" alt="Image" src="https://github.com/user-attachments/assets/f06029a5-3fe4-4c29-b88d-cc42c51d4e8f" />

Screenshot 3 — Execute Workflow node settings: Mode = "Run once for each item", Wait for Sub-Workflow Completion = ON:

<img width="1097" height="461" alt="Image" src="https://github.com/user-attachments/assets/88d1b469-8c2b-40a1-8f59-623ef3021c7a" />

Suspected Regression Cause

This regression was introduced between v2.14.2 (working) and v2.16.0 (broken). Two PRs in this window are likely candidates:

1. perf(core): Make Wait node fully durable by removing in-memory execution path (#27066) — Released in v2.16.0

This PR fundamentally changed how the WaitTracker and getWaitingExecutions() work:

  • getWaitingExecutions() WHERE clause changed: status != 'crashed'status = 'waiting' (now only picks up executions with exactly status = 'waiting')
  • Time window narrowed: 70-second lookahead → 15-second DB-clock lookahead
  • Poll interval changed: 60s → 5s
  • In-memory execution path removed: All waits now go through DB persistence

Hypothesis: When the Execute Workflow node runs in "Run once for each item" mode with "Wait for Sub-Workflow Completion" enabled, the parent execution enters a waiting state internally. The new strict status = 'waiting' filter in getWaitingExecutions() may interact incorrectly with how the parent execution's status is set during sub-workflow coordination in workflow-execute-additional-data.ts. The old status != 'crashed' filter was more permissive and would pick up these executions regardless of their exact status.

2. fix(core): Fix execution history when flow includes wait node (#27357) — Released in v2.15.0

This re-applied a previously reverted fix (#23146, reverted in #25610) for execution timestamp handling when Wait nodes resume. While this PR focuses on startedAt preservation, changes to the execution status/resume flow could have side effects on sub-workflow completion signaling.

Related Issues

  • #10444 — "Workflows executed by workflows display as Queued" (closed, fixed in v1.59.0 via #10764) — same symptom family but different root cause
  • #25531 — "Sub-workflow in queue mode showing as failed execution" (closed) — status propagation failure
  • #13135 — "Calling a sub-workflow with a wait by webhook node" (closed) — related sub-workflow completion issues
  • #14203 — "Wait condition in Execute workflow causes wrong data output" (open) — related execution engine handling

Debug Info

## core
- n8nVersion: 2.16.0
- platform: docker (self-hosted)
- nodeJsVersion: 24.14.1
- nodeEnv: production
- database: postgres
- executionMode: regular
- concurrency: 16

## storage
- success: all
- error: all
- progress: false
- manual: true
- binaryMode: filesystem

## pruning
- enabled: true
- maxAge: 720 hours
- maxCount: 15000 executions

## client
- userAgent: mozilla/5.0 (windows nt 10.0; win64; x64) applewebkit/537.36 (khtml, like gecko) chrome/146.0.0.0 safari/537.36 edg/146.0.0.0
- isTouchDevice: false

## security
- secureCookie: true

Generated at: 2026-04-08T16:17:25.547Z

Additional Context

  • This is a single-process n8n instance (Task Broker on port 5679, no BullMQ workers)
  • The bug is in workflow-execute coordination, not queue infrastructure
  • Server logs show a 37-minute gap with zero log output while n8n process was alive but the execution engine was silently stuck
  • Sub-workflow API calls completed successfully (visible in logs), but the parent execution never received the completion signal
  • Workaround: Downgrade to v2.14.2 where the issue does not occur

Operating System

docker (self-hosted)

n8n Version

2.16.0

Node.js Version

24.14.1

Database

PostgreSQL

Execution mode

main (default)

Hosting

self hosted

extent analysis

TL;DR

The most likely fix for the issue is to downgrade to n8n version 2.14.2, where the workflow execution coordination worked correctly.

Guidance

  1. Verify the issue: Confirm that the problem occurs when using the "Execute Workflow" node with "Run once for each item" mode and "Wait for Sub-Workflow Completion" enabled in n8n version 2.16.0.
  2. Check execution logs: Investigate the server logs for any errors or gaps in log output, similar to the 37-minute gap mentioned in the issue.
  3. Test with previous version: Downgrade to n8n version 2.14.2 and verify that the workflow execution works as expected.
  4. Monitor for similar issues: Keep an eye on related issues, such as #14203, which may be connected to the same root cause.

Example

No code snippet is provided, as the issue seems to be related to the n8n workflow engine and its configuration.

Notes

The issue appears to be a regression introduced between versions 2.14.2 and 2.16.0. The exact cause is uncertain, but it may be related to changes in the Wait node or execution history handling.

Recommendation

Apply the workaround by downgrading to version 2.14.2, as this version is known to work correctly. This will allow you to continue using the "Execute Workflow" node with "Run once for each item" mode and "Wait for Sub-Workflow Completion" enabled without encountering the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

The parent workflow should detect that all sub-workflow executions have completed and proceed to the next node ("Merge marketplaces parallel" in my case).

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING