On 2021-07-26 9:54 a.m., kgail...@redhat.com wrote:
On Fri, 2021-07-23 at 21:46 -0400, Digimer wrote:
After a LOT of hassle, I finally got it updated, but OMG it was
painful.

I degraded the cluster (unsure if needed), set maintenance mode,
deleted
the stonith levels, deleted the stonith devices, recreated them with
the
updated values, recreated the stonith levels, and finally disabled
maintenance mode.

It should not have been this hard, right? Why is heck would it be
that
pacemaker kept "rolling back" to old configs? I'd delete the stonith
That is bizarre. It sounds like the CIB changes were taking effect
locally, then being rejected by the rest of the cluster, which would
send the "correct" CIB back to the originator.

The logs of interest would be pacemaker.log from both nodes at the time
you made the first configuration change that failed. I'm guessing the
logs you posted were from after that point?

Below are the logs. The change appears to first try at 'Jul 23 16:22:27', made on an-a02n01, included logs for a few minutes before in case relevant.

* an-a02n01: https://www.alteeve.com/an-repo/files/an-a02n01.pacemaker.log
* an-a02n02: https://www.alteeve.com/an-repo/files/an-a02n02.pacemaker.log

Note that the PDUs as originally configured (10.201.2.1/2) were not available, so I had to disable and cleanup the stonith resources. They seemed to keep getting re-enabled, so I got to the habit of doing this cycle of disable -> cleanup -> disable -> cleanup before I could reliably get the resources to be 'stopped (disabled)' in 'pcs stonith status'.

digimer

-- 
Digimer
Papers and Projects: https://alteeve.com/w/
"I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to