Error Message

Debug info

core

n8nVersion: 2.21.1
platform: docker (self-hosted)
nodeJsVersion: 24.14.1
nodeEnv: live
database: postgres
executionMode: scaling (single-main)
concurrency: -1
license: enterprise (production)
consumerId: XXXXX

storage

success: all
error: all
progress: false
manual: true
binaryMode: database

pruning

enabled: true
maxAge: 720 hours
maxCount: 50000 executions

client

userAgent: XXX
isTouchDevice: false

cluster

instanceCount: 8
versions: 2.21.1
instances:
- instanceKey: AAA, hostId: worker-n8n-live-worker-7b4d5559cd-4rnfm, instanceType: worker, instanceRole: unset, version: 2.21.1
- instanceKey: BBB, hostId: worker-n8n-live-worker-7b4d5559cd-js45b, instanceType: worker, instanceRole: unset, version: 2.21.1
- instanceKey: CCC, hostId: worker-n8n-live-worker-7b4d5559cd-bxd7w, instanceType: worker, instanceRole: unset, version: 2.21.1
- instanceKey: DDD, hostId: worker-n8n-live-worker-7b4d5559cd-bhtgp, instanceType: worker, instanceRole: unset, version: 2.21.1
- instanceKey: EEE, hostId: worker-n8n-live-worker-7b4d5559cd-cbvjc, instanceType: worker, instanceRole: unset, version: 2.21.1
- instanceKey: FFF, hostId: webhook-n8n-live-webhook-6bb58db4f5-4tlk2, instanceType: webhook, instanceRole: unset, version: 2.21.1
- instanceKey: GGG, hostId: main-n8n-live-659bb659b7-zwrsj, instanceType: main, instanceRole: leader, version: 2.21.1
- instanceKey: HHH, hostId: worker-n8n-live-worker-7b4d5559cd-br99z, instanceType: worker, instanceRole: unset, version: 2.21.1
checks:
- check: hostid-clash, status: succeeded, warnings: -
- check: lifecycle, status: succeeded, warnings: -
- check: split-brain, status: succeeded, warnings: -
- check: version-mismatch, status: succeeded, warnings: -

Generated at: 2026-06-05T08:25:22.143Z

Code Example

# Debug info

## core

- n8nVersion: 2.21.1
- platform: docker (self-hosted)
- nodeJsVersion: 24.14.1
- nodeEnv: live
- database: postgres
- executionMode: scaling (single-main)
- concurrency: -1
- license: enterprise (production)
- consumerId: XXXXX

## storage

- success: all
- error: all
- progress: false
- manual: true
- binaryMode: database

## pruning

- enabled: true
- maxAge: 720 hours
- maxCount: 50000 executions

## client

- userAgent: XXX
- isTouchDevice: false

## cluster

- instanceCount: 8
- versions: 2.21.1
- instances:
  - instanceKey: AAA, hostId: worker-n8n-live-worker-7b4d5559cd-4rnfm, instanceType: worker, instanceRole: unset, version: 2.21.1
  - instanceKey: BBB, hostId: worker-n8n-live-worker-7b4d5559cd-js45b, instanceType: worker, instanceRole: unset, version: 2.21.1
  - instanceKey: CCC, hostId: worker-n8n-live-worker-7b4d5559cd-bxd7w, instanceType: worker, instanceRole: unset, version: 2.21.1
  - instanceKey: DDD, hostId: worker-n8n-live-worker-7b4d5559cd-bhtgp, instanceType: worker, instanceRole: unset, version: 2.21.1
  - instanceKey: EEE, hostId: worker-n8n-live-worker-7b4d5559cd-cbvjc, instanceType: worker, instanceRole: unset, version: 2.21.1
  - instanceKey: FFF, hostId: webhook-n8n-live-webhook-6bb58db4f5-4tlk2, instanceType: webhook, instanceRole: unset, version: 2.21.1
  - instanceKey: GGG, hostId: main-n8n-live-659bb659b7-zwrsj, instanceType: main, instanceRole: leader, version: 2.21.1
  - instanceKey: HHH, hostId: worker-n8n-live-worker-7b4d5559cd-br99z, instanceType: worker, instanceRole: unset, version: 2.21.1
- checks:
  - check: hostid-clash, status: succeeded, warnings: -
  - check: lifecycle, status: succeeded, warnings: -
  - check: split-brain, status: succeeded, warnings: -
  - check: version-mismatch, status: succeeded, warnings: -

Generated at: 2026-06-05T08:25:22.143Z

Bug Description

Last week we had a downtime of 1.5h after an AWS RDS upgrade. The upgrade causes a failover but there is no downtime usually (multi-AZ). Judging from the logs, it seems that the n8n instance kept failing to reconnect to the DB for 90 minutes.

To Reproduce

It seems to be hard to reproduce, since the same procedure worked fine on another instance. Looking through the code (with help of Claude), it seems there are a few problems, which in some edge cases can totally break the retry circuit. Most obvious improvement would be enabling TCP Keepalive for the DB connection. It's a no-brainer IMO.

Expected behavior

n8n should re-establish connection to the database immediately after the failover succeeds, no downtime is needed or expected. In worst case, it should become back online after the 15min tcp connection times out, even though that is undesirable as well, the reality was much worse, 90 minutes of downtime.

Debug Info

# Debug info

## core

- n8nVersion: 2.21.1
- platform: docker (self-hosted)
- nodeJsVersion: 24.14.1
- nodeEnv: live
- database: postgres
- executionMode: scaling (single-main)
- concurrency: -1
- license: enterprise (production)
- consumerId: XXXXX

## storage

- success: all
- error: all
- progress: false
- manual: true
- binaryMode: database

## pruning

- enabled: true
- maxAge: 720 hours
- maxCount: 50000 executions

## client

- userAgent: XXX
- isTouchDevice: false

## cluster

- instanceCount: 8
- versions: 2.21.1
- instances:
  - instanceKey: AAA, hostId: worker-n8n-live-worker-7b4d5559cd-4rnfm, instanceType: worker, instanceRole: unset, version: 2.21.1
  - instanceKey: BBB, hostId: worker-n8n-live-worker-7b4d5559cd-js45b, instanceType: worker, instanceRole: unset, version: 2.21.1
  - instanceKey: CCC, hostId: worker-n8n-live-worker-7b4d5559cd-bxd7w, instanceType: worker, instanceRole: unset, version: 2.21.1
  - instanceKey: DDD, hostId: worker-n8n-live-worker-7b4d5559cd-bhtgp, instanceType: worker, instanceRole: unset, version: 2.21.1
  - instanceKey: EEE, hostId: worker-n8n-live-worker-7b4d5559cd-cbvjc, instanceType: worker, instanceRole: unset, version: 2.21.1
  - instanceKey: FFF, hostId: webhook-n8n-live-webhook-6bb58db4f5-4tlk2, instanceType: webhook, instanceRole: unset, version: 2.21.1
  - instanceKey: GGG, hostId: main-n8n-live-659bb659b7-zwrsj, instanceType: main, instanceRole: leader, version: 2.21.1
  - instanceKey: HHH, hostId: worker-n8n-live-worker-7b4d5559cd-br99z, instanceType: worker, instanceRole: unset, version: 2.21.1
- checks:
  - check: hostid-clash, status: succeeded, warnings: -
  - check: lifecycle, status: succeeded, warnings: -
  - check: split-brain, status: succeeded, warnings: -
  - check: version-mismatch, status: succeeded, warnings: -

Generated at: 2026-06-05T08:25:22.143Z

Operating System

AWS EKS

n8n Version

2.21.1

Node.js Version

24.14.1

Database

PostgreSQL

Execution mode

queue

Hosting

self hosted

FAQ

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

n8n - 💡(How to fix) Fix Bug in DB retry logic, connection not re-established indefinitely

Recommended Tools

GitHub issue graph ai analysis

Error Message

Debug info

core

storage

pruning

client

cluster

Code Example

Bug Description

To Reproduce

Expected behavior

Debug Info

Operating System

n8n Version

Node.js Version

Database

Execution mode

Hosting

FAQ

Expected behavior

Still need to ship something?

TRENDING