Dne 4.12.2017 v 23:17 Ken Gaillot napsal(a):
On Mon, 2017-12-04 at 22:08 +0300, Andrei Borzenkov wrote:
04.12.2017 18:47, Tomas Jelinek пишет:
Dne 4.12.2017 v 16:02 Kristoffer Grönlund napsal(a):
Tomas Jelinek <tojel...@redhat.com> writes:


* how is it shutting down the cluster when issuing "pcs
cluster stop
--all"?

First, it sends a request to each node to stop pacemaker. The
requests
are sent in parallel which prevents resources from being moved
from node
to node. Once pacemaker stops on all nodes, corosync is stopped
on all
nodes in the same manner.

* any race condition possible where the cib will record only
one
node up before
     the last one shut down?
* will the cluster start safely?

That definitely sounds racy to me. The best idea I can think of
would be
to set all nodes except one in standby, and then shutdown
pacemaker
everywhere...


What issues does it solve? Which node should be the one?

How do you get the nodes out of standby mode on startup?

Is --lifetime=reboot valid for cluster properties? It is accepted by
crm_attribute and actually puts value as transient_attribute.

standby is a node attribute, so lifetime does apply normally.


Right, I forgot about this.

I was dealing with 'pcs cluster stop --all' back in January 2015, so I don't remember all the details anymore. However, I was able to dig out the private email thread where stopping a cluster was discussed with pacemaker developers including Andrew Beekhof and David Vossel.

Originally, pcs was stopping nodes in parallel in such a manner that each node stopped pacemaker and then corosync independently of other nodes. This caused loss of quorum during stopping the cluster, as nodes hosting resources which stopped fast disconnected from corosync sooner than nodes hosting resources which stopped slowly. Due to quorum missing, some resources could not be stopped and the cluster stop failed. This is covered in here:
https://bugzilla.redhat.com/show_bug.cgi?id=1180506

The first attempt to fix the issue was to put nodes into standby mode with --lifetime=reboot:
https://github.com/ClusterLabs/pcs/commit/ea6f37983191776fd46d90f22dc1432e0bfc0b91

This didn't work for several reasons. One of them was back then there was no reliable way to set standby mode with --lifetime=reboot for more than one node in a single step. (This may have been fixed in the meantime.) There were however other serious reasons for not putting the nodes into standby as was explained by Andrew: - it [putting the nodes into standby first] means shutdown takes longer (no node stops until all the resources stop)
- it makes shutdown more complex (== more fragile), eg...
- it result in pcs waiting forever for resources to stop
- if a stop fails and the cluster is configured to start at boot, then the node will get fenced and happily run resources when it returns (because all the nodes are up so we still have quorum) - only potentially benefits resources that have no (or very few) dependants and can stop quicker than it takes pcs to get through its "initiate parallel shutdown" loop (which should be rather fast since there is no ssh connection setup overheads)

So we ended up with just stopping pacemaker in parallel:
https://github.com/ClusterLabs/pcs/commit/1ab2dd1b13839df7e5e9809cde25ac1dbae42c3d

I hope this shed light on why pcs stops clusters the way it does and that standby was considered but rejected for good reasons.

Regards,
Tomas

_______________________________________________
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to