On Fri, 2021-07-23 at 21:46 -0400, Digimer wrote:After a LOT of hassle, I finally got it updated, but OMG it was painful.I degraded the cluster (unsure if needed), set maintenance mode, deleted the stonith levels, deleted the stonith devices, recreated them with the updated values, recreated the stonith levels, and finally disabled maintenance mode. It should not have been this hard, right? Why is heck would it be that pacemaker kept "rolling back" to old configs? I'd delete the stonithThat is bizarre. It sounds like the CIB changes were taking effect locally, then being rejected by the rest of the cluster, which would send the "correct" CIB back to the originator. The logs of interest would be pacemaker.log from both nodes at the time you made the first configuration change that failed. I'm guessing the logs you posted were from after that point?
Below are the logs. The change appears to first try at 'Jul 23
16:22:27', made on an-a02n01, included logs for a few minutes
before in case relevant.
* an-a02n01:
https://www.alteeve.com/an-repo/files/an-a02n01.pacemaker.log
* an-a02n02:
https://www.alteeve.com/an-repo/files/an-a02n02.pacemaker.log
Note that the PDUs as originally configured (10.201.2.1/2) were
not available, so I had to disable and cleanup the stonith
resources. They seemed to keep getting re-enabled, so I got to the
habit of doing this cycle of disable -> cleanup -> disable
-> cleanup before I could reliably get the resources to be
'stopped (disabled)' in 'pcs stonith status'.
digimer
-- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/