Dne 25. 01. 21 v 17:01 Ken Gaillot napsal(a):
On Mon, 2021-01-25 at 09:51 +0100, Jehan-Guillaume de Rorthais wrote:
Hi Digimer,
On Sun, 24 Jan 2021 15:31:22 -0500
Digimer <li...@alteeve.ca> wrote:
[...]
I had a test server (srv01-test) running on node 1 (el8-a01n01),
and on
node 2 (el8-a01n02) I ran 'pcs cluster stop --all'.
It appears like pacemaker asked the VM to migrate to node 2
instead of
stopping it. Once the server was on node 2, I couldn't use 'pcs
resource
disable <vm>' as it returned that that resource was unmanaged, and
the
cluster shut down was hung. When I directly stopped the VM and then
did
a 'pcs resource cleanup', the cluster shutdown completed.
As actions during a cluster shutdown cannot be handled in the same
transition
for each nodes, I usually add a step to disable all resources using
property
"stop-all-resources" before shutting down the cluster:
pcs property set stop-all-resources=true
pcs cluster stop --all
But it seems there's a very new cluster property to handle that
(IIRC, one or
two releases ago). Look at "shutdown-lock" doc:
[...]
some users prefer to make resources highly available only for
failures, with
no recovery for clean shutdowns. If this option is true, resources
active on a
node when it is cleanly shut down are kept "locked" to that node
(not allowed
to run elsewhere) until they start again on that node after it
rejoins (or
for at most shutdown-lock-limit, if set).
[...]
[...]
So as best as I can tell, pacemaker really did ask for a
migration. Is
this the case?
AFAIK, yes, because each cluster shutdown request is handled
independently at
node level. There's a large door open for all kind of race conditions
if
requests are handled with some random lags on each nodes.
I'm going to guess that's what happened.
The basic issue is that there is no "cluster shutdown" in Pacemaker,
only "node shutdown". I'm guessing "pcs cluster stop --all" sends
shutdown requests for each node in sequence (probably via systemd), and
if the nodes are quick enough, one could start migrating off resources
before all the others get their shutdown request.
Pcs is doing its best to stop nodes in parallel. The first
implementation of this was done back in 2015:
https://bugzilla.redhat.com/show_bug.cgi?id=1180506
Since then, we moved to using curl for network communication, which also
handles parallel cluster stop. Obviously, this doesn't ensure the stop
command arrives to and is processed on all nodes at the exactly same time.
Basically, pcs sends 'stop pacemaker' request to all nodes in parallel
and waits for it to finish on all nodes. Then it sends 'stop corosync'
request to all nodes in parallel. The actual stopping on each node is
done by 'systemctl stop'.
Yes, the nodes which get the request sooner may start migrating resources.
Regards,
Tomas
There would be a way around it. Normally Pacemaker is shut down via
SIGTERM to pacemakerd (which is what systemctl stop does), but inside
Pacemaker it's implemented as a special "shutdown" transient node
attribute, set to the epoch timestamp of the request. It would be
possible to set that attribute for all nodes in a copy of the CIB, then
load that into the live cluster.
stop-all-resources as suggested would be another way around it (and
would have to be cleared after start-up, which could be a plus or a
minus depending on how much control vs convenience you want).
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/