Vladislav Bogdanov <bub...@hoster-ok.com> writes: > If pacemaker has got an error on start, it will run stop with the same > set of parameters anyways. And will get error again if that one was > from validation and RA does not differentiate validation for start and > stop. And then circular fencing over the whole cluster is triggered > for no reason. > > Of course, for safety, RA could save its state if start was successful > and skip validation on stop only if that state is not found. Otherwise > removed binary or config file would result in resource running on > several nodes.
What would happen if we made the start operation return OCF_NOT_RUNNING if validation fails? Or more broadly: if the start operation knows that the resource is not running, thus a stop opration would do no good. >From Pacemaker Explained B.4: "The cluster will not attempt to stop a resource that returns this for any action." The probes could still return OCF_ERR_CONFIGURED, putting real info into the logs, the stop failure could still lead to fencing, protecting data integrity, but circular fencing would not happen. I hope. By the way, what are the reasons to run stop after a failed start? To clean up halfway-started resources? Besides OCF_ERR_GENERIC, the other error codes pretty much guarrantee that the resource can not be active. -- Regards, Feri. _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org