On Wed, Jun 8, 2016 at 10:29 AM, Andrew Beekhof <abeek...@redhat.com> wrote: > On Wed, Jun 8, 2016 at 12:11 AM, Adam Spiers <aspi...@suse.com> wrote: >> Ken Gaillot <kgail...@redhat.com> wrote: >>> On 06/06/2016 05:45 PM, Adam Spiers wrote: >>> > Adam Spiers <aspi...@suse.com> wrote: >>> >> Andrew Beekhof <abeek...@redhat.com> wrote: >>> >>> On Tue, Jun 7, 2016 at 8:29 AM, Adam Spiers <aspi...@suse.com> wrote: >>> >>>> Ken Gaillot <kgail...@redhat.com> wrote: >>> >>>>> My main question is how useful would it actually be in the proposed >>> >>>>> use >>> >>>>> cases. Considering the possibility that the expected start might never >>> >>>>> happen (or fail), can an RA really do anything different if >>> >>>>> start_expected=true? >>> >>>> >>> >>>> That's the wrong question :-) >>> >>>> >>> >>>>> If the use case is there, I have no problem with >>> >>>>> adding it, but I want to make sure it's worthwhile. >>> >>>> >>> >>>> The use case which started this whole thread is for >>> >>>> start_expected=false, not start_expected=true. >>> >>> >>> >>> Isn't this just two sides of the same coin? >>> >>> If you're not doing the same thing for both cases, then you're just >>> >>> reversing the order of the clauses. >>> >> >>> >> No, because the stated concern about unreliable expectations >>> >> ("Considering the possibility that the expected start might never >>> >> happen (or fail)") was regarding start_expected=true, and that's the >>> >> side of the coin we don't care about, so it doesn't matter if it's >>> >> unreliable. >>> > >>> > BTW, if the expected start happens but fails, then Pacemaker will just >>> > keep repeating until migration-threshold is hit, at which point it >>> > will call the RA 'stop' action finally with start_expected=false. >>> > So that's of no concern. >>> >>> To clarify, that's configurable, via start-failure-is-fatal and on-fail >> >> Sure. >> >>> > Maybe your point was that if the expected start never happens (so >>> > never even gets a chance to fail), we still want to do a nova >>> > service-disable? >>> >>> That is a good question, which might mean it should be done on every >>> stop -- or could that cause problems (besides delays)? >> >> No, the whole point of adding this feature is to avoid a >> service-disable on every stop, and instead only do it on the final >> stop. If there are corner cases where we never reach the final stop, >> that's not a disaster because nova will eventually figure it out and >> do the right thing when the server-agent connection times out. >> >>> Another aspect of this is that the proposed feature could only look at a >>> single transition. What if stop is called with start_expected=false, but >>> then Pacemaker is able to start the service on the same node in the next >>> transition immediately afterward? Would having called service-disable >>> cause problems for that start? >> >> We would also need to ensure that service-enable is called on start >> when necessary. Perhaps we could track the enable/disable state in a >> local temporary file, and if the file indicates that we've previously >> done service-disable, we know to run service-enable on start. This >> would avoid calling service-enable on every single start. > > feels like an over-optimization > in fact, the whole thing feels like that if i'm honest.
Today the stars aligned :-) http://xkcd.com/1691/ > > why are we trying to optimise the projected performance impact when > the system is in terrible shape already? > >> >>> > Yes that would be nice, but this proposal was never intended to >>> > address that. I guess we'd need an entirely different mechanism in >>> > Pacemaker for that. But let's not allow perfection to become the >>> > enemy of the good ;-) >>> >>> The ultimate concern is that this will encourage people to write RAs >>> that leave services in a dangerous state after stop is called. >> >> I don't see why it would. > > Previous experience suggests it definitely will. > > People will do exactly what you're thinking but with something important. > They'll see it behaves as they expect in best-case testing and never > think about the corner cases. > Then they'll start thinking about optimising their start operations, > write some "optimistic" state recording code and break those too. > > Imagine a bug in your state recording code (maybe you forget to handle > a missing state file after reboot) that means the 'enable' does't get > run. The service is up, but nova will never use it. > >> The new feature will be obscure enough that >> noone would be able to use it without reading the corresponding >> documentation first anyway. > > I like your optimism. > >> >>> I think with naming and documenting it properly, I'm fine to provide the >>> option, but I'm on the fence. Beekhof needs a little more convincing :-) >> >> Can you provide an example of a potential real-world situation where >> an RA author would end up accidentally abusing the feature? > > You want a real-world example of how someone could accidentally > mis-using a feature that doesn't exist yet? > > Um... if we knew all the weird and wonderful ways people break our > code we'd be able to build a better mouse trap. > >> >> Thanks a lot for your continued attention on this! >> >> Adam >> >> _______________________________________________ >> Users mailing list: Users@clusterlabs.org >> http://clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org