Hi all, As we get close to finish our Anvil! switch to pacemaker, I'm trying to tie up loose ends. One that I want feedback on is the pacemaker version of cman's old 'post_join_delay' feature.
Use case example; A common use for the Anvil! is remote deployments where there is no (IT) humans available. Think cargo ships, field data collection, etc. So it's entirely possible that a node could fail and not be repaired for weeks or even months. With this in mind, it's also feasible that a solo node later loses power, and then reboots. In such a case, 'pcs cluster start' would never go quorate as the peer is dead. In cman, during startup, if there was no reply from the peer after post_join_delay seconds, the peer would get fenced and then the cluster would finish coming up. Being two_node, it would also become quorate and start hosting services. Of course, this opens the risk of a fence loop, but we have other protections in place to prevent that, so a fence loop is not a concern. My question then is two-fold; 1. Is there a pacemaker equivalent to 'post_join_delay'? (Fence the peer and, if successful, become quorate)? 2. If not, was this a conscious decision not to add it for some reason, or was it simply never added? If it was consciously decided to not have it, what was the reasoning behind it? I can replicate this behaviour in our code, but I don't want to do that if there is a compelling reason that I am not aware of. So, A) is there a pacemaker version of post_join_delay? B) is there a compelling argument NOT to use post_join_delay behaviour in pacemaker I am not seeing? Thanks! -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/