Re: [ClusterLabs] Very long timeout shutting down a server with systemd resource

Roberto Ferrari Tue, 24 Jan 2023 01:41:00 -0800

On 1/23/23 19:05, Reid Wahl wrote:

On Mon, Jan 23, 2023 at 9:59 AM Roberto Ferrari <rferr...@mbigroup.it> wrote:


On 23/01/23 18:25, Reid Wahl wrote:

On Mon, Jan 23, 2023 at 7:51 AM Roberto Ferrari <rferr...@mbigroup.it> wrote:


Hello everybody,
I'd like to understand a strange behavior of a cluster of mine with,
basically, some IPAddr resource and a systemd resource that deals with
netfilter-persistent.
Here the configuration:

primitive FW-VIP-Outside IPaddr2 \
           params ip=192.168.26.74 cidr_netmask=24 nic=outside arp_bg=true \
           op monitor interval=20s timeout=20s
primitive FW-VIP-Private IPaddr2 \
           params ip=192.168.104.100 cidr_netmask=24 nic=private arp_bg=true \
           op monitor interval=20s timeout=20s
primitive Netfilter systemd:netfilter-persistent \
           op start interval=0 timeout=60 \
           op stop interval=0 timeout=60
group FW-VIPs FW-VIP-Private FW-VIP-Outside Netfilter
The active node, when I reboot the server, hangs shutting down for many
minutes writing:

A stop job is running for Pacemaker High Availability Cluster Manager (
11 s / 30 min). (where 11 is the number of seconds already passed)

Obviously switching to another master is immediate and performing
syetmctl stop netfilter-persistent is immediate too.

Do you have any hint on what goes wrong with this? I cannot find
anything strange in the logs.

Thanks a lot,

Roberto.


Is the netfilter systemd unit enabled outside pacemaker? Run
`systemctl is-enabled netfilter-persistent` to find out, and run
`systemctl disable netfilter-persistent` to disable it if it's
enabled. Only Pacemaker should start or stop netfilter.


--
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



--
Regards,

Reid Wahl (He/Him)
Senior Software Engineer, Red Hat
RHEL High Availability - Pacemaker

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Thank's a lot Reid,
Unfortunately it wasn't my case, netfilter-persistent seemed to be
disabled at boot.
Cheers,

R.


Can you share the pacemaker logs from the shutdown period? That will
probably give some idea of what it's waiting on.

--
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Here you are:

Jan 23 09:21:33 usab-fe2 pacemaker-controld[1327]: notice: Result ofstart operation for Netfilter on usab-fe2: 0 (ok)Jan 23 09:27:45 usab-fe2 pacemakerd[1296]: notice: Caught 'Terminated'signal

Jan 23 09:27:45 usab-fe2 pacemakerd[1296]:  notice: Shutting down Pacemaker

Jan 23 09:27:45 usab-fe2 systemd[1]: Stopping Pacemaker HighAvailability Cluster Manager...Jan 23 09:27:45 usab-fe2 pacemakerd[1296]: notice: Stoppingpacemaker-controldJan 23 09:27:45 usab-fe2 pacemaker-controld[1327]: notice: Caught'Terminated' signalJan 23 09:27:45 usab-fe2 pacemaker-controld[1327]: notice: Shuttingdown cluster resource managerJan 23 09:27:45 usab-fe2 pacemaker-attrd[1325]: notice: Settingshutdown[usab-fe2]: (unset) -> 1674466065Jan 23 09:28:45 usab-fe2 pacemaker-execd[1324]: notice: Giving up onNetfilter stop (rc=0): timeout (elapsed=59991ms, remaining=9ms)Jan 23 09:28:45 usab-fe2 pacemaker-controld[1327]: error: Result ofstop operation for Netfilter on usab-fe2: Timed OutJan 23 09:28:45 usab-fe2 pacemaker-attrd[1325]: notice: Settingfail-count-Netfilter#stop_0[usab-fe2]: (unset) -> INFINITYJan 23 09:28:45 usab-fe2 pacemaker-attrd[1325]: notice: Settinglast-failure-Netfilter#stop_0[usab-fe2]: (unset) -> 1674466125Jan 23 09:47:45 usab-fe2 pacemaker-controld[1327]: error: ShutdownEscalation just popped in state S_NOT_DC!Jan 23 09:47:45 usab-fe2 pacemaker-controld[1327]: notice: Statetransition S_NOT_DC -> S_STOPPINGJan 23 09:47:45 usab-fe2 pacemaker-controld[1327]: notice: Stopped 0recurring operations at shutdown... waiting (2 remaining)Jan 23 09:47:45 usab-fe2 pacemaker-controld[1327]: notice: Recurringaction FW-VIP-Private:64 (FW-VIP-Private_monitor_20000) incomplete atshutdownJan 23 09:47:45 usab-fe2 pacemaker-controld[1327]: notice: Recurringaction FW-VIP-Outside:66 (FW-VIP-Outside_monitor_20000) incomplete atshutdownJan 23 09:47:45 usab-fe2 pacemaker-controld[1327]: error: 3 resourceswere active at shutdownJan 23 09:47:45 usab-fe2 pacemaker-controld[1327]: notice: Disconnectedfrom the executorJan 23 09:47:45 usab-fe2 pacemaker-controld[1327]: notice: Disconnectedfrom CorosyncJan 23 09:47:45 usab-fe2 pacemaker-controld[1327]: notice: Disconnectedfrom the CIB manager

Jan 23 09:47:45 usab-fe2 systemd[1]: pacemaker.service: Succeeded.

Jan 23 09:47:45 usab-fe2 systemd[1]: Stopped Pacemaker High AvailabilityCluster Manager.

-- Reboot --

Jan 23 09:49:34 usab-fe2 systemd[1]: Started Pacemaker High AvailabilityCluster Manager.


Thank's,

R.
--
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Very long timeout shutting down a server with systemd resource

Reply via email to