On Wed, Feb 10, 2016 at 12:06:34PM +0100, Ferenc Wágner wrote: > Dejan Muhamedagic <deja...@fastmail.fm> writes: > > > If the environment is no good (bad installation, missing configuration > > and similar), then the stop operation probably won't do much good. > > Agreed. It may not even know how to probe it. > > > In ocf-rarun, validate_all is run, but then the operation is not > > carried out if the environment is invalid. In particular, the resource > > is considered to be stopped, and the stop operation exits with > > success. > > This sounds dangerous. What if the local configuration of a node gets > damaged while a resource is running on it?
I understand your worry, but cannot imagine how that could happen, unless in case of a more serious failure such as disk crash, which, the failure, should really cause fencing at another level. The most common case, by far, is some mistake or omission during cluster setup. Humans tend to make mistakes. As Vladislav wrote elsewhere in this thread, this can cause a fencing loop, which is no fun, in particular if pacemaker is set to start on boot. It happened to me a few times and I guess I don't need to describe the intensity of my feelings toward computers in general and the cluster stack in particular (not to mention the RA author). > Eventually the cluster may > try to stop it, think that it succeeded and start the resource on > another node. Now you have two instances running. Or is the resource > probed on each node before the start? No, I don't think so. The probes are run only on crmd start. > Can a probe failure save your day > here? Or do you only mean resource parameters by "environment" (which > should be identical on each host, so validation would fail everywhere)? The validation typically checks the configuration and then whether various files (programs) and directories exist, sometimes if directories are writable. There could be more, but at least I would prefer to stop here. Anyway, we could introduce something like optional emergency_stop() which would be invoked in ocf-rarun in case the validation failed. And/or say a RUN_STOP_ANYWAY variable which would allow stop to be run regardless. But note that it is extremely difficult to prove or make sure that executing RA _after_ the validate step failed is going to produce meaningful results. In addition, there could also be FENCE_ON_INVALID_ENVIRONMENT (to be set by the user) for the very paranoid ;-) Cheers, Dejan > -- > Thanks, > Feri. > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org