system shutdown

lejeczek via Users Thu, 09 Nov 2023 12:44:49 -0800


On 07/11/2023 17:57, lejeczek via Users wrote:

hi guys
Having 3-node pgSQL cluster with PAF - when all threesystems are shutdown at virtually the same time then PAFfails to start when HA cluster is operational again.
from status:
...
Migration Summary:
  * Node: ubusrv2 (2):
* PGSQL-PAF-5433: migration-threshold=1000000fail-count=1000000 last-failure='Tue Nov 7 17:52:38 2023'
  * Node: ubusrv3 (3):
* PGSQL-PAF-5433: migration-threshold=1000000fail-count=1000000 last-failure='Tue Nov 7 17:52:38 2023'
  * Node: ubusrv1 (1):
* PGSQL-PAF-5433: migration-threshold=1000000fail-count=1000000 last-failure='Tue Nov 7 17:52:38 2023'
Failed Resource Actions:
* PGSQL-PAF-5433_stop_0 on ubusrv2 'error' (1): call=90,status='complete', exitreason='Unexpected state forinstance "PGSQL-PAF-5433" (returned 1)',last-rc-change='Tue Nov 7 17:52:38 2023', queued=0ms,exec=84ms * PGSQL-PAF-5433_stop_0 on ubusrv3 'error' (1): call=82,status='complete', exitreason='Unexpected state forinstance "PGSQL-PAF-5433" (returned 1)',last-rc-change='Tue Nov 7 17:52:38 2023', queued=0ms,exec=82ms * PGSQL-PAF-5433_stop_0 on ubusrv1 'error' (1): call=86,status='complete', exitreason='Unexpected state forinstance "PGSQL-PAF-5433" (returned 1)',last-rc-change='Tue Nov 7 17:52:38 2023', queued=0ms,exec=108ms
and all three pgSQLs show virtually identical logs:
...
2023-11-07 16:54:45.532 UTC [24936] LOG: startingPostgreSQL 14.9 (Ubuntu 14.9-0ubuntu0.22.04.1) onx86_64-pc-linux-gnu, compiled by gcc (Ubuntu11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit2023-11-07 16:54:45.532 UTC [24936] LOG: listening onIPv4 address "0.0.0.0", port 54332023-11-07 16:54:45.532 UTC [24936] LOG: listening onIPv6 address "::", port 54332023-11-07 16:54:45.535 UTC [24936] LOG: listening onUnix socket "/var/run/postgresql/.s.PGSQL.5433"2023-11-07 16:54:45.547 UTC [24938] LOG: database systemwas interrupted while in recovery at log time 2023-11-0715:30:56 UTC2023-11-07 16:54:45.547 UTC [24938] HINT: If this hasoccurred more than once some data might be corrupted andyou might need to choose an earlier recovery target.2023-11-07 16:54:45.819 UTC [24938] LOG: entering standbymode2023-11-07 16:54:45.824 UTC [24938] FATAL: could not opendirectory "/var/run/postgresql/14-paf.pg_stat_tmp": Nosuch file or directory2023-11-07 16:54:45.825 UTC [24936] LOG: startup process(PID 24938) exited with exit code 12023-11-07 16:54:45.825 UTC [24936] LOG: aborting startupdue to startup process failure2023-11-07 16:54:45.826 UTC [24936] LOG: database systemis shut down
Is this "test" case's result, as I showed above, expected?It reproduces every time.
If not - what might it be I'm missing?

many thanks, L.

Actually, the resource fails to start on a node a singlenode - as opposed to entire cluster shutdown as I notedoriginally - which was powered down in an orderly fashionand powered back on.That the the time of power-cycle the node was PAF resourcemaster, it fails:

...

2023-11-09 20:35:04.439 UTC [17727] LOG: startingPostgreSQL 14.9 (Ubuntu 14.9-0ubuntu0.22.04.1) onx86_64-pc-linux-gnu, compiled by gcc (Ubuntu11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit2023-11-09 20:35:04.439 UTC [17727] LOG: listening on IPv4address "0.0.0.0", port 54332023-11-09 20:35:04.439 UTC [17727] LOG: listening on IPv6address "::", port 54332023-11-09 20:35:04.442 UTC [17727] LOG: listening on Unixsocket "/var/run/postgresql/.s.PGSQL.5433"2023-11-09 20:35:04.452 UTC [17731] LOG: database systemwas interrupted while in recovery at log time 2023-11-0920:25:21 UTC2023-11-09 20:35:04.452 UTC [17731] HINT: If this hasoccurred more than once some data might be corrupted and youmight need to choose an earlier recovery target.

2023-11-09 20:35:04.809 UTC [17731] LOG:  entering standby mode

2023-11-09 20:35:04.813 UTC [17731] FATAL: could not opendirectory "/var/run/postgresql/14-paf.pg_stat_tmp": No suchfile or directory2023-11-09 20:35:04.814 UTC [17727] LOG: startup process(PID 17731) exited with exit code 12023-11-09 20:35:04.814 UTC [17727] LOG: aborting startupdue to startup process failure2023-11-09 20:35:04.815 UTC [17727] LOG: database system isshut down

The master at the time node was shut down did get moved overto standby/slave node, properly,


I'm on Ubuntu with:

ii corosync 3.1.6-1ubuntu1 amd64 cluster engine daemon and utilitiesii pacemaker 2.1.2-1ubuntu3.1 amd64 cluster resource managerii pacemaker-cli-utils 2.1.2-1ubuntu3.1 amd64 cluster resource manager command line utilitiesii pacemaker-common 2.1.2-1ubuntu3.1 all cluster resource manager common filesii pacemaker-resource-agents 2.1.2-1ubuntu3.1 all cluster resource manager general resource agentsii pcs 0.10.11-2ubuntu3 all PacemakerConfiguration System


And here is the resource:
-> $ pcs resource config PGSQL-PAF-5433-clone
 Clone: PGSQL-PAF-5433-clone

Meta Attrs: failure-timeout=20s master-max=1 notify=truepromotable=true Resource: PGSQL-PAF-5433 (class=ocf provider=heartbeattype=pgsqlms) Attributes: bindir=/usr/lib/postgresql/14/bindatadir=/var/lib/postgresql/14/pafpgdata=/etc/postgresql/14/paf pgport=5433 Operations: demote interval=0s timeout=120s(PGSQL-PAF-5433-demote-interval-0s) methods interval=0s timeout=5(PGSQL-PAF-5433-methods-interval-0s) monitor interval=15s role=Master timeout=10s(PGSQL-PAF-5433-monitor-interval-15s) monitor interval=16s role=Slave timeout=10s(PGSQL-PAF-5433-monitor-interval-16s) notify interval=0s timeout=60s(PGSQL-PAF-5433-notify-interval-0s) promote interval=0s timeout=30s(PGSQL-PAF-5433-promote-interval-0s) reload interval=0s timeout=20(PGSQL-PAF-5433-reload-interval-0s) start interval=0s timeout=60s(PGSQL-PAF-5433-start-interval-0s) stop interval=0s timeout=60s(PGSQL-PAF-5433-stop-interval-0s)

Is this my setup/config or there might actually be an issuewith the PAF |& HA not handling node-OS shutdown?

all & any thoughts are much apreciated.
Thanks, L.

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] PAF / pgSQL fails after OS/system shutdown

Reply via email to