On 06/06/18 15:51 -0400, Styopa Semenukha wrote: > We wrote a role to configure Pacemaker clusters, and I'd like to share > it with the community. Any questions or comments welcome.
Hello and thanks for the announcement. And now, something I've meant to write down regarding configuration management SW vs. cluster for quite some time; it just happens to be provoked by the topic, so there's nothing to be taken personally (good piece of work has been done with project if I can judge) * * * While I see why Ansible is compelling, I feel it's important to challenge this trend of trying to bend/rebrand _machine-local configuration management tool_ as _distributed system management tool_ (pacemaker is distributed application/framework of sorts), which Ansible alone is _not_, as far as I know, hence the effort doesn't seem to be 100% sound (which really matters if reliability is the goal). Once more, this has nothing to do with the announced project, it's just the trending fuss on this topic that indicates me that people independently, as they keenly invent their own wheel (here: Ansible roles), get blind to the fallacy everything must work nicely with multi machine shared-state scenarios like they are used to with single host bootstrapping, without any shortcomings. But there are, and precisely because not the optimal tool for the task gets selected! Just imagine what would happen if a single machine got configured independently with multiple Ansible actors (there may be mechanisms -- relatively easy within the same host -- that would prevent such interferences, but assume now they are not strong enough). What will happen? Likely some mess-ups will occur as glorified idempotence is hard to achieve atomically. Voila, inflicted race conditions, one by one, get exercised, until there's enough of bad luck that the rule of idempotence gets broken, just because of these processes emulating a schizophrenic (at the same time multitasking) admin. Ouch! Now, reflect This to the situation with possibly concurrent cluster configuration. One cannot really expect the cluster stack to be bullet-proof against these sorts of mishandling. Single cluster administrator operating at a time? Ideal! Few administrators presumably with separate areas of configuration interest? Pacemaker is quite ready. Cluster configuration randomly touched from random node at random time (equivalent of said schizophrenic multitasking administrator with a single host)? Chances are off in sufficiently long perioud when this happens. The solution here is to break that randomness, configuration is modified either: 1. from a single node at a time in the cluster (plus preferrably batching all required changes into a single request) 2, mutual time-critical exclusion of triggering the changes across the nodes 3. mutual locality-critical exclusion in the subject of the changes initiated from particular nodes Putting 1. and 3. aside as not very interesting (1. means a degenerate case with single point of failure, and 3. kills the universality), what we get is really a dependency on some kind of distributed lock and/or transactional system. Well, we have just discovered that what we need to automate our predestined configuration in the cluster reliably and without hurting universality (like "breaking the node symmetry") is said distributed system management ("orchestration") tool. Has Ansible these capabilities? Now, one idea there might be to make the tools like pcs compensate for these shortcomings of machine-local configuration management ones. Sounds good, right? Absolutely not, more like a bad joke! Because what else can it be, the development of orchestration-like features (with all the complexities solved once in corosync/DLM already; relaxing non-dependency on the very subject of management may not be wise) on top of regular high-level cluster management tool only[*] to bridge the gap in something that is simply subpar fit in distributed environments to begin with? As Czech proverb puts it: think twice, act once. [*] non-automated/human-triggered usage is generally fine as it's highly unlikely none of 1.-3. would be satisfied, so there would be next to no gain for these workflows -- Poki
pgpvAKCCpGANo.pgp
Description: PGP signature
_______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org