>>> Harvey Shepherd <harvey.sheph...@aviatnet.com> schrieb am 22.07.2020 um 23:43 in Nachricht <cy4pr2201mb1142a9006826ee69a7fbef458b...@cy4pr2201mb1142.namprd22.prod.outlook. om>: > Thanks for your response Reid. What you say makes sense, and under normal > circumstances if a resource failed, I'd want all of its dependents to be > stopped cleanly before restarting the failed resource. However if pacemaker
> is shutting down on a node (e.g. due to a restart request), then I just want > to failover as fast as possible, so an unclean kill is fine. At the moment > the shutdown process is taking 2 mins. I was just wondering if there was a > way to do this. Hi! I think you are mixing two concepts: A shutdown request is the attempt to stop things cleanly all the time, while a node failure (which will be followed by a fencing opration) definitely will be unable to do a clean shutdown as the node is considered to be dead already. Also remember that even STONITH (fencing will take some time), and maybe generally it's better to try a stop with timeout (which will fence THEN if the timeout expired). And of course: HA software is not to make any stop operation faster ;-) Regards, Ulrich > > Regards, > Harvey > > ________________________________ > From: Users <users‑boun...@clusterlabs.org> on behalf of Reid Wahl > <nw...@redhat.com> > Sent: 23 July 2020 08:05 > To: Cluster Labs ‑ All topics related to open‑source clustering welcomed > <users@clusterlabs.org> > Subject: EXTERNAL: Re: [ClusterLabs] Pacemaker Shutdown > > > On Tue, Jul 21, 2020 at 11:42 PM Harvey Shepherd > <harvey.sheph...@aviatnet.com<mailto:harvey.sheph...@aviatnet.com>> wrote: > Hi All, > > I'm running Pacemaker 2.0.3 on a two‑node cluster, controlling 40+ resources > which are a mixture of clones and other resources that are colocated with the > master instance of certain clones. I've noticed that if I terminate pacemaker > on the node that is hosting the master instances of the clones, Pacemaker > focuses on stopping resources on that node BEFORE failing over to the other > node, leading to a longer outage than necessary. Is there a way to change > this behaviour? > > Hi, Harvey. > > As you likely know, a given resource active/passive resource will have to > stop on one node before it can start on another node, and the same goes for a > promoted clone instance having to demote on one node before it can promote on > another. There are exceptions for clone instances and for promotable clones > with promoted‑max > 1 ("allow more than one master instance"). A resource > that's configured to run on one node at a time should not try to run on two > nodes during failover. > > With that in mind, what exactly are you wanting to happen? Is the problem > that all resources are stopping on node 1 before any of them start on node 2? > Or that you want Pacemaker shutdown to kill the processes on node 1 instead > of cleanly shutting them down? Or something different? > > These are the actions and logs I saw during the test: > > Ack. This seems like it's just telling us that Pacemaker is going through a > graceful shutdown. The info more relevant to the resource stop/start order > would be in /var/log/pacemaker/pacemaker.log (or less detailed in > /var/log/messages) on the DC. > > # /etc/init.d/pacemaker stop > Signaling Pacemaker Cluster Manager to terminate > > Waiting for cluster services to > unload..............................................................sending > signal 9 to procs > > > 2020 Jul 22 06:16:50.581 Chassis2 daemon.notice CTR8740 pacemaker. Signaling > Pacemaker Cluster Manager to terminate > 2020 Jul 22 06:16:50.599 Chassis2 daemon.notice CTR8740 pacemaker. Waiting > for cluster services to unload > 2020 Jul 22 06:18:01.794 Chassis2 daemon.warning CTR8740 pacemaker‑based.6140 > warning: new_event_notification (6140‑6141‑9): Broken pipe (32) > 2020 Jul 22 06:18:01.794 Chassis2 daemon.warning CTR8740 pacemaker‑based.6140 > warning: Notification of client stonithd/665bde82‑cb28‑40f7‑9132‑8321dc2f1992 > failed > 2020 Jul 22 06:18:01.794 Chassis2 daemon.warning CTR8740 pacemaker‑based.6140 > warning: new_event_notification (6140‑6143‑8): Broken pipe (32) > 2020 Jul 22 06:18:01.794 Chassis2 daemon.warning CTR8740 pacemaker‑based.6140 > warning: Notification of client attrd/a26ca273‑3422‑4ebe‑8cb7‑95849b8ff130 > failed > 2020 Jul 22 06:18:03.320 Chassis1 daemon.warning CTR8740 > pacemaker‑schedulerd.6240 warning: Blind faith: not fencing unseen nodes > 2020 Jul 22 06:18:58.941 Chassis2 user.crit CTR8740 supervisor. pacemaker is > inactive (3). > > Regards, > Harvey > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > > > ‑‑ > Regards, > > Reid Wahl, RHCA > Software Maintenance Engineer, Red Hat > CEE ‑ Platform Support Delivery ‑ ClusterHA _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/