Hi, On Tue, Feb 09, 2016 at 05:15:15PM +0300, Vladislav Bogdanov wrote: > 09.02.2016 16:31, Kristoffer Grönlund wrote: > >Vladislav Bogdanov <bub...@hoster-ok.com> writes: > > > >>Hi, > >> > >>when performing a delete operation, crmsh (2.2.0) having -F tries > >>to stop passed op arguments and then waits for DC to become idle. > >> > > > >Hi again, > > > >I have pushed a fix that only waits for DC if any resources were > >actually stopped: https://github.com/ClusterLabs/crmsh/commit/164aa48 > > Great! > > > > >> > >>More, it may be worth checking stop-orphan-resources property and pass stop > >>work to pacemaker if it is set to true. > > > >I am a bit concerned that this might not be 100% reliable. I found an > >older discussion regarding this and the recommendation from David Vossel > >then was to always make sure resources were stopped before removing > >them, and not relying on stop-orphan-resources to clean things up > >correctly. His example of when this might not work well is when removing > >a group, as the group members might get stopped out-of-order. > > OK, I agree. That was just an idea. > > > > >At the same time, I have thought before that the current functionality > >is not great. Having to stop resources before removing them is if > >nothing else annoying! I have a tentative change proposal to this where > >crmsh would stop the resources even if --force is not set, and there > >would be a flag to pass to stop to get it to ignore whether resources > >are running, since that may be useful if the resource is misconfigured > >and the stop action doesn't work. > > That should result in fencing, no? I think that is RA issue if that > happens.
Right. Unfortunately, this case often gets too little attention; people typically test with good and working configurations only. The first time we hear about it is from some annoyed user who's node got fenced for no good reason. Even worse, with some bad configurations, it can happen that the nodes get fenced in a round-robin fashion, which certainly won't make your time very productive. > Particularly, imho RAs should not run validate_all on stop > action. I'd disagree here. If the environment is no good (bad installation, missing configuration and similar), then the stop operation probably won't do much good. Ultimately, it may depend on how the resource is managed. In ocf-rarun, validate_all is run, but then the operation is not carried out if the environment is invalid. In particular, the resource is considered to be stopped, and the stop operation exits with success. One of the most common cases is when the software resides on shared non-parallel storage. BTW, handling the stop and monitor/probe operations was the primary motivation to develop ocf-rarun. It's often quite difficult to get these things right. Cheers, Dejan > Best, > Vladislav > > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org