Andrew Beekhof <abeek...@redhat.com> wrote: > On Fri, Jun 24, 2016 at 1:01 AM, Adam Spiers <aspi...@suse.com> wrote: > > Andrew Beekhof <abeek...@redhat.com> wrote: > > >> > Well, if you're OK with bending the rules like this then that's good > >> > enough for me to say we should at least try it :) > >> > >> I still say you shouldn't only do it on error. > > > > When else should it be done? > > I was thinking whenever a stop() happens.
OK, seems we are agreed finally :) > > IIUC, disabling/enabling the service is independent of the up/down > > state which nova tracks automatically, and which based on slightly > > more than a skim of the code, is dependent on the state of the RPC > > layer. > > > >> > But how would you avoid repeated consecutive invocations of "nova > >> > service-disable" when the monitor action fails, and ditto for "nova > >> > service-enable" when it succeeds? > >> > >> I don't think you can. Not ideal but I'd not have thought a deal breaker. > > > > Sounds like a massive deal-breaker to me! With op monitor > > interval="10s" and 100 compute nodes, that would mean 10 pointless > > calls to nova-api every second. Am I missing something? > > I was thinking you would only call it for the "I detected a failure > case" and service-enable would still be on start(). > So the number of pointless calls per second would be capped at one > tenth of the number of failed compute nodes. > > One would hope that all of them weren't dead. Oh OK - yeah that wouldn't be nearly as bad. > > Also I don't see any benefit to moving the API calls from start/stop > > actions to the monitor action. If there's a failure, Pacemaker will > > invoke the stop action, so we can do service-disable there. > > I agree. Doing it unconditionally at stop() is my preferred option, I > was only trying to provide a path that might be close to the behaviour > you were looking for. > > > If the > > start action is invoked and we successfully initiate startup of > > nova-compute, the RA can undo any service-disable it previously did > > (although it should not reverse a service-disable done elsewhere, > > e.g. manually by the cloud operator). > > Agree Trying to adjust to this new sensation of agreement ;-) > >> > Earlier in this thread I proposed > >> > the idea of a tiny temporary file in /run which tracks the last known > >> > state and optimizes away the consecutive invocations, but IIRC you > >> > were against that. > >> > >> I'm generally not a fan, but sometimes state files are a necessity. > >> Just make sure you think through what a missing file might mean. > > > > Sure. A missing file would mean the RA's never called service-disable > > before, > > And that is why I generally don't like state files. > The default location for state files doesn't persist across reboots. > > t1. stop (ie. disable) > t2. reboot > t3. start with no state file > t4. WHY WONT NOVA USE THE NEW COMPUTE NODE STUPID CLUSTERS Well then we simply put the state file somewhere which does persist across reboots. > > which means that it shouldn't call service-enable on startup. > > > >> Unless.... use the state file to store the date at which the last > >> start operation occurred? > >> > >> If we're calling stop() and data - start_date > threshold, then, if > >> you must, be optimistic, skip service-disable and assume we'll get > >> started again soon. > >> > >> Otherwise if we're calling stop() and data - start_date <= threshold, > >> always call service-disable because we're in a restart loop which is > >> not worth optimising for. > >> > >> ( And always call service-enable at start() ) > >> > >> No Pacemaker feature or Beekhof approval required :-) > > > > Hmm ... it's possible I just don't understand this proposal fully, > > but it sounds a bit woolly to me, e.g. how would you decide a suitable > > threshold? > > roll a dice? > > > I think I preferred your other suggestion of just skipping the > > optimization, i.e. calling service-disable on the first stop, and > > service-enable on (almost) every start. > > good :) > > And the use of force-down from your subsequent email sounds excellent OK great! We finally got there :-) Now I guess I just have to write the spec and the actual code ;-) _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org