On Wed, Jun 8, 2016 at 12:11 AM, Adam Spiers <aspi...@suse.com> wrote: > Ken Gaillot <kgail...@redhat.com> wrote: >> On 06/06/2016 05:45 PM, Adam Spiers wrote: >> > Adam Spiers <aspi...@suse.com> wrote: >> >> Andrew Beekhof <abeek...@redhat.com> wrote: >> >>> On Tue, Jun 7, 2016 at 8:29 AM, Adam Spiers <aspi...@suse.com> wrote: >> >>>> Ken Gaillot <kgail...@redhat.com> wrote: >> >>>>> My main question is how useful would it actually be in the proposed use >> >>>>> cases. Considering the possibility that the expected start might never >> >>>>> happen (or fail), can an RA really do anything different if >> >>>>> start_expected=true? >> >>>> >> >>>> That's the wrong question :-) >> >>>> >> >>>>> If the use case is there, I have no problem with >> >>>>> adding it, but I want to make sure it's worthwhile. >> >>>> >> >>>> The use case which started this whole thread is for >> >>>> start_expected=false, not start_expected=true. >> >>> >> >>> Isn't this just two sides of the same coin? >> >>> If you're not doing the same thing for both cases, then you're just >> >>> reversing the order of the clauses. >> >> >> >> No, because the stated concern about unreliable expectations >> >> ("Considering the possibility that the expected start might never >> >> happen (or fail)") was regarding start_expected=true, and that's the >> >> side of the coin we don't care about, so it doesn't matter if it's >> >> unreliable. >> > >> > BTW, if the expected start happens but fails, then Pacemaker will just >> > keep repeating until migration-threshold is hit, at which point it >> > will call the RA 'stop' action finally with start_expected=false. >> > So that's of no concern. >> >> To clarify, that's configurable, via start-failure-is-fatal and on-fail > > Sure. > >> > Maybe your point was that if the expected start never happens (so >> > never even gets a chance to fail), we still want to do a nova >> > service-disable? >> >> That is a good question, which might mean it should be done on every >> stop -- or could that cause problems (besides delays)? > > No, the whole point of adding this feature is to avoid a > service-disable on every stop, and instead only do it on the final > stop. If there are corner cases where we never reach the final stop, > that's not a disaster because nova will eventually figure it out and > do the right thing when the server-agent connection times out. > >> Another aspect of this is that the proposed feature could only look at a >> single transition. What if stop is called with start_expected=false, but >> then Pacemaker is able to start the service on the same node in the next >> transition immediately afterward? Would having called service-disable >> cause problems for that start? > > We would also need to ensure that service-enable is called on start > when necessary. Perhaps we could track the enable/disable state in a > local temporary file, and if the file indicates that we've previously > done service-disable, we know to run service-enable on start. This > would avoid calling service-enable on every single start.
feels like an over-optimization in fact, the whole thing feels like that if i'm honest. why are we trying to optimise the projected performance impact when the system is in terrible shape already? > >> > Yes that would be nice, but this proposal was never intended to >> > address that. I guess we'd need an entirely different mechanism in >> > Pacemaker for that. But let's not allow perfection to become the >> > enemy of the good ;-) >> >> The ultimate concern is that this will encourage people to write RAs >> that leave services in a dangerous state after stop is called. > > I don't see why it would. Previous experience suggests it definitely will. People will do exactly what you're thinking but with something important. They'll see it behaves as they expect in best-case testing and never think about the corner cases. Then they'll start thinking about optimising their start operations, write some "optimistic" state recording code and break those too. Imagine a bug in your state recording code (maybe you forget to handle a missing state file after reboot) that means the 'enable' does't get run. The service is up, but nova will never use it. > The new feature will be obscure enough that > noone would be able to use it without reading the corresponding > documentation first anyway. I like your optimism. > >> I think with naming and documenting it properly, I'm fine to provide the >> option, but I'm on the fence. Beekhof needs a little more convincing :-) > > Can you provide an example of a potential real-world situation where > an RA author would end up accidentally abusing the feature? You want a real-world example of how someone could accidentally mis-using a feature that doesn't exist yet? Um... if we knew all the weird and wonderful ways people break our code we'd be able to build a better mouse trap. > > Thanks a lot for your continued attention on this! > > Adam > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org