n8n - 💡(How to fix) Fix Bug in DB retry logic, connection not re-established indefinitely

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

Debug info

core

  • n8nVersion: 2.21.1
  • platform: docker (self-hosted)
  • nodeJsVersion: 24.14.1
  • nodeEnv: live
  • database: postgres
  • executionMode: scaling (single-main)
  • concurrency: -1
  • license: enterprise (production)
  • consumerId: XXXXX

storage

  • success: all
  • error: all
  • progress: false
  • manual: true
  • binaryMode: database

pruning

  • enabled: true
  • maxAge: 720 hours
  • maxCount: 50000 executions

client

  • userAgent: XXX
  • isTouchDevice: false

cluster

  • instanceCount: 8
  • versions: 2.21.1
  • instances:
    • instanceKey: AAA, hostId: worker-n8n-live-worker-7b4d5559cd-4rnfm, instanceType: worker, instanceRole: unset, version: 2.21.1
    • instanceKey: BBB, hostId: worker-n8n-live-worker-7b4d5559cd-js45b, instanceType: worker, instanceRole: unset, version: 2.21.1
    • instanceKey: CCC, hostId: worker-n8n-live-worker-7b4d5559cd-bxd7w, instanceType: worker, instanceRole: unset, version: 2.21.1
    • instanceKey: DDD, hostId: worker-n8n-live-worker-7b4d5559cd-bhtgp, instanceType: worker, instanceRole: unset, version: 2.21.1
    • instanceKey: EEE, hostId: worker-n8n-live-worker-7b4d5559cd-cbvjc, instanceType: worker, instanceRole: unset, version: 2.21.1
    • instanceKey: FFF, hostId: webhook-n8n-live-webhook-6bb58db4f5-4tlk2, instanceType: webhook, instanceRole: unset, version: 2.21.1
    • instanceKey: GGG, hostId: main-n8n-live-659bb659b7-zwrsj, instanceType: main, instanceRole: leader, version: 2.21.1
    • instanceKey: HHH, hostId: worker-n8n-live-worker-7b4d5559cd-br99z, instanceType: worker, instanceRole: unset, version: 2.21.1
  • checks:
    • check: hostid-clash, status: succeeded, warnings: -
    • check: lifecycle, status: succeeded, warnings: -
    • check: split-brain, status: succeeded, warnings: -
    • check: version-mismatch, status: succeeded, warnings: -

Generated at: 2026-06-05T08:25:22.143Z

Code Example

# Debug info

## core

- n8nVersion: 2.21.1
- platform: docker (self-hosted)
- nodeJsVersion: 24.14.1
- nodeEnv: live
- database: postgres
- executionMode: scaling (single-main)
- concurrency: -1
- license: enterprise (production)
- consumerId: XXXXX

## storage

- success: all
- error: all
- progress: false
- manual: true
- binaryMode: database

## pruning

- enabled: true
- maxAge: 720 hours
- maxCount: 50000 executions

## client

- userAgent: XXX
- isTouchDevice: false

## cluster

- instanceCount: 8
- versions: 2.21.1
- instances:
  - instanceKey: AAA, hostId: worker-n8n-live-worker-7b4d5559cd-4rnfm, instanceType: worker, instanceRole: unset, version: 2.21.1
  - instanceKey: BBB, hostId: worker-n8n-live-worker-7b4d5559cd-js45b, instanceType: worker, instanceRole: unset, version: 2.21.1
  - instanceKey: CCC, hostId: worker-n8n-live-worker-7b4d5559cd-bxd7w, instanceType: worker, instanceRole: unset, version: 2.21.1
  - instanceKey: DDD, hostId: worker-n8n-live-worker-7b4d5559cd-bhtgp, instanceType: worker, instanceRole: unset, version: 2.21.1
  - instanceKey: EEE, hostId: worker-n8n-live-worker-7b4d5559cd-cbvjc, instanceType: worker, instanceRole: unset, version: 2.21.1
  - instanceKey: FFF, hostId: webhook-n8n-live-webhook-6bb58db4f5-4tlk2, instanceType: webhook, instanceRole: unset, version: 2.21.1
  - instanceKey: GGG, hostId: main-n8n-live-659bb659b7-zwrsj, instanceType: main, instanceRole: leader, version: 2.21.1
  - instanceKey: HHH, hostId: worker-n8n-live-worker-7b4d5559cd-br99z, instanceType: worker, instanceRole: unset, version: 2.21.1
- checks:
  - check: hostid-clash, status: succeeded, warnings: -
  - check: lifecycle, status: succeeded, warnings: -
  - check: split-brain, status: succeeded, warnings: -
  - check: version-mismatch, status: succeeded, warnings: -

Generated at: 2026-06-05T08:25:22.143Z
RAW_BUFFERClick to expand / collapse

Bug Description

Last week we had a downtime of 1.5h after an AWS RDS upgrade. The upgrade causes a failover but there is no downtime usually (multi-AZ). Judging from the logs, it seems that the n8n instance kept failing to reconnect to the DB for 90 minutes.

To Reproduce

It seems to be hard to reproduce, since the same procedure worked fine on another instance. Looking through the code (with help of Claude), it seems there are a few problems, which in some edge cases can totally break the retry circuit. Most obvious improvement would be enabling TCP Keepalive for the DB connection. It's a no-brainer IMO.

Expected behavior

n8n should re-establish connection to the database immediately after the failover succeeds, no downtime is needed or expected. In worst case, it should become back online after the 15min tcp connection times out, even though that is undesirable as well, the reality was much worse, 90 minutes of downtime.

Debug Info

# Debug info

## core

- n8nVersion: 2.21.1
- platform: docker (self-hosted)
- nodeJsVersion: 24.14.1
- nodeEnv: live
- database: postgres
- executionMode: scaling (single-main)
- concurrency: -1
- license: enterprise (production)
- consumerId: XXXXX

## storage

- success: all
- error: all
- progress: false
- manual: true
- binaryMode: database

## pruning

- enabled: true
- maxAge: 720 hours
- maxCount: 50000 executions

## client

- userAgent: XXX
- isTouchDevice: false

## cluster

- instanceCount: 8
- versions: 2.21.1
- instances:
  - instanceKey: AAA, hostId: worker-n8n-live-worker-7b4d5559cd-4rnfm, instanceType: worker, instanceRole: unset, version: 2.21.1
  - instanceKey: BBB, hostId: worker-n8n-live-worker-7b4d5559cd-js45b, instanceType: worker, instanceRole: unset, version: 2.21.1
  - instanceKey: CCC, hostId: worker-n8n-live-worker-7b4d5559cd-bxd7w, instanceType: worker, instanceRole: unset, version: 2.21.1
  - instanceKey: DDD, hostId: worker-n8n-live-worker-7b4d5559cd-bhtgp, instanceType: worker, instanceRole: unset, version: 2.21.1
  - instanceKey: EEE, hostId: worker-n8n-live-worker-7b4d5559cd-cbvjc, instanceType: worker, instanceRole: unset, version: 2.21.1
  - instanceKey: FFF, hostId: webhook-n8n-live-webhook-6bb58db4f5-4tlk2, instanceType: webhook, instanceRole: unset, version: 2.21.1
  - instanceKey: GGG, hostId: main-n8n-live-659bb659b7-zwrsj, instanceType: main, instanceRole: leader, version: 2.21.1
  - instanceKey: HHH, hostId: worker-n8n-live-worker-7b4d5559cd-br99z, instanceType: worker, instanceRole: unset, version: 2.21.1
- checks:
  - check: hostid-clash, status: succeeded, warnings: -
  - check: lifecycle, status: succeeded, warnings: -
  - check: split-brain, status: succeeded, warnings: -
  - check: version-mismatch, status: succeeded, warnings: -

Generated at: 2026-06-05T08:25:22.143Z

Operating System

AWS EKS

n8n Version

2.21.1

Node.js Version

24.14.1

Database

PostgreSQL

Execution mode

queue

Hosting

self hosted

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

n8n should re-establish connection to the database immediately after the failover succeeds, no downtime is needed or expected. In worst case, it should become back online after the 15min tcp connection times out, even though that is undesirable as well, the reality was much worse, 90 minutes of downtime.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING