Andrew Beekhof <abeek...@redhat.com> wrote: > On Wed, Jun 15, 2016 at 10:42 PM, Adam Spiers <aspi...@suse.com> wrote: > > Andrew Beekhof <abeek...@redhat.com> wrote: > >> On Mon, Jun 13, 2016 at 9:34 PM, Adam Spiers <aspi...@suse.com> wrote: > >> > Andrew Beekhof <abeek...@redhat.com> wrote: > >> >> On Wed, Jun 8, 2016 at 6:23 PM, Adam Spiers <aspi...@suse.com> wrote: > >> >> > Andrew Beekhof <abeek...@redhat.com> wrote: > >> >> >> On Wed, Jun 8, 2016 at 12:11 AM, Adam Spiers <aspi...@suse.com> > >> >> >> wrote: > >> >> >> > We would also need to ensure that service-enable is called on start > >> >> >> > when necessary. Perhaps we could track the enable/disable state > >> >> >> > in a > >> >> >> > local temporary file, and if the file indicates that we've > >> >> >> > previously > >> >> >> > done service-disable, we know to run service-enable on start. This > >> >> >> > would avoid calling service-enable on every single start. > >> >> >> > >> >> >> feels like an over-optimization > >> >> >> in fact, the whole thing feels like that if i'm honest. > >> >> > > >> >> > Huh ... You didn't seem to think that when we discussed automating > >> >> > service-disable at length in Austin. > >> >> > >> >> I didn't feel the need to push back because RH uses the systemd agent > >> >> instead so you're only hanging yourself, but more importantly because > >> >> the proposed implementation to facilitate it wasn't leading RA writers > >> >> down a hazardous path :-) > >> > > >> > I'm a bit confused by that statement, because the only proposed > >> > implementation we came up with in Austin was adding this new feature > >> > to Pacemaker. > >> > >> _A_ new feature, not _this_ new feature. > >> The one we discussed was far less prone to being abused but, as it > >> turns out, also far less useful for what you were trying to do. > > > > Was there really that much significant change since the original idea? > > IIRC the only thing which really changed was the type, from "number of > > retries remaining" to a boolean "there are still some retries" left. > > The new implementation has nothing to do with retries. Like the new > name, it is based on "is a start action expected".
Oh yeah, I remember now. > Thats why I got an attack of the heebie-jeebies. I'm not sure why, but at least now I understand your change of position :-) > > I'm not sure why the integer approach would be far less open to abuse, > > or even why it would have been far less useful. I'm probably missing > > something. > > > > [snipped] > > > >> >> >> why are we trying to optimise the projected performance impact > >> >> > > >> >> > It's not really "projected"; we know exactly what the impact is. And > >> >> > it's not really a performance impact either. If nova-compute (or a > >> >> > dependency) is malfunctioning on a compute node, there will be a > >> >> > window (bounded by nova.conf's rpc_response_timeout value, IIUC) in > >> >> > which nova-scheduler could still schedule VMs onto that compute node, > >> >> > and then of course they'll fail to boot. > >> >> > >> >> Right, but that window exists regardless of whether the node is or is > >> >> not ever coming back. > >> > > >> > Sure, but the window's a *lot* bigger if we don't do service-disable. > >> > Although perhaps your question "why are we trying to optimise the > >> > projected performance impact" was actually "why are we trying to avoid > >> > extra calls to service-disable" rather than "why do we want to call > >> > service-disable" as I initially assumed. Is that right? > >> > >> Exactly. I assumed it was to limit the noise we'd be generating in doing > >> so. > > > > Sort of - not just the noise, but the extra delay introduced by > > calling service-disable, restarting nova-compute, and then calling > > service-enable again when it succeeds. > > Ok, but restarting nova-compute is not optional and the bits that are > optional are all but completely asynchronous* - so the overhead should > be negligible. > > * Like most API calls, they are Ack'd when the request has been > received, not processed. Yes, fair points. > >> >> > The masakari folks have a lot of operational experience in this space, > >> >> > and they found that this was enough of a problem to justify calling > >> >> > nova service-disable whenever the failure is detected. > >> >> > >> >> If you really want it whenever the failure is detected, call it from > >> >> the monitor operation that finds it broken. > >> > > >> > Hmm, that appears to violate what I assume would be a fundamental > >> > design principle of Pacemaker: that the "monitor" action never changes > >> > the system's state (assuming there are no Heisenberg-like side effects > >> > of monitoring, of course). > >> > >> That has traditionally been the considered a good idea, in the vast > >> majority of cases I still think it is a good idea, but its also a > >> guideline that has been broken because there is no other way for the > >> agent to work *cough* rabbit *cough*. > >> > >> In this specific case, I think it could be forgivable because you're > >> not strictly altering the service but something that sits in front of > >> it. start/stop/monitor would all continue to do TheRightThing(tm). > >> > >> > I guess you could argue that in this case, > >> > the nova server's internal state could be considered outside the > >> > system which Pacemaker is managing. > >> > >> Right. > > > > Well, if you're OK with bending the rules like this then that's good > > enough for me to say we should at least try it :) > > I still say you shouldn't only do it on error. When else should it be done? IIUC, disabling/enabling the service is independent of the up/down state which nova tracks automatically, and which based on slightly more than a skim of the code, is dependent on the state of the RPC layer. > > But how would you avoid repeated consecutive invocations of "nova > > service-disable" when the monitor action fails, and ditto for "nova > > service-enable" when it succeeds? > > I don't think you can. Not ideal but I'd not have thought a deal breaker. Sounds like a massive deal-breaker to me! With op monitor interval="10s" and 100 compute nodes, that would mean 10 pointless calls to nova-api every second. Am I missing something? Also I don't see any benefit to moving the API calls from start/stop actions to the monitor action. If there's a failure, Pacemaker will invoke the stop action, so we can do service-disable there. If the start action is invoked and we successfully initiate startup of nova-compute, the RA can undo any service-disable it previously did (although it should not reverse a service-disable done elsewhere, e.g. manually by the cloud operator). > > Earlier in this thread I proposed > > the idea of a tiny temporary file in /run which tracks the last known > > state and optimizes away the consecutive invocations, but IIRC you > > were against that. > > I'm generally not a fan, but sometimes state files are a necessity. > Just make sure you think through what a missing file might mean. Sure. A missing file would mean the RA's never called service-disable before, which means that it shouldn't call service-enable on startup. > Unless.... use the state file to store the date at which the last > start operation occurred? > > If we're calling stop() and data - start_date > threshold, then, if > you must, be optimistic, skip service-disable and assume we'll get > started again soon. > > Otherwise if we're calling stop() and data - start_date <= threshold, > always call service-disable because we're in a restart loop which is > not worth optimising for. > > ( And always call service-enable at start() ) > > No Pacemaker feature or Beekhof approval required :-) Hmm ... it's possible I just don't understand this proposal fully, but it sounds a bit woolly to me, e.g. how would you decide a suitable threshold? I think I preferred your other suggestion of just skipping the optimization, i.e. calling service-disable on the first stop, and service-enable on (almost) every start. [snipped] > > OK, so by "unnecessary recovery optimization of an unrecoverable > > system" you meant "unnecessary recovery optimization of a system which > > we already tried and failed to recover", > > right > > > whereas I thought you meant > > "unnecessary recovery optimization of a failed system which we haven't > > tried to recover yet". But the optimization we are talking about > > (i.e. skipping "nova service-disable") would only happen during > > recovery attempts, before we finally give up, so it only applies to > > the latter meaning, not the former. > > Strictly speaking, I would disagree. > > The first time we call stop() is the only point at which the "we > haven't tried to recover yet" claus is true. > > Yet even then we do not and can not know if start() will ever be > called, nor if it would be here or on another node, nor if any future > invocation will succeed. So there is no real justification for the > existence of an optimistic code path in stop() - which is why I keep > banging on about correctness trumping optimisation. That makes sense. > [This is the point I came up with the (date - start_date) stuff above] > > If you're hell bent on some kind of first-time optimisation though, No, I'm not. As I said, I'm happy to try this. So, in untested pseudo code: stop () { # "Best effort" attempt to tell nova the service is # unavailable; we cannot rely on this to succeed (since if the # stop is due to a failure, that failure might also be # creating issues with nova-api), so it has to be # asynchronous. # # N.B. At this point the service might have already been # disabled externally, e.g. manually by the cloud operator for # planned maintenance. attempt_nova_service_disable ensure_nova_compute_stopped } start () { if ! start_nova_compute; then # Start-up failed, so we don't want to reenable. return fi # In contrast to attempt_nova_service_disable, this has to be # synchronous, since if we can't enable the service, there's # no point running it and pretending everything's OK. # # FIXME: we don't want to do this if something else other than # the stop() action previously called service-disable. ensure_nova_service_enabled } monitor () { # Here some application-level probe is better than a PID check. } As per the FIXME, one remaining problem is dealing with this kind of scenario: - Cloud operator notices SMART warnings on the compute node which is not yet causing hard failures but signifies that the hard disk might die soon. - Operator manually runs "nova service-disable" with the intention of doing some maintenance soon, i.e. live-migrating instances away and replacing the dying hard disk. - Before the operator gracefully shuts down nova-compute, an I/O error from the disk causes nova-compute to fail. - Pacemaker invokes the monitor action which spots the failure. - Pacemaker invokes the stop action which runs service-disable. - Pacemaker attempts to restart nova-compute by invoking the start action. Since the disk failure is currently intermittent, we get (un)lucky and nova-compute starts fine. Then it calls service-enable - BAD! This is now overriding the cloud operator's manual request for the service to be disabled. If we're really unlucky, nova-scheduler will now start up new VMs on the node, even though the hard disk is dying. However I can't see a way to defend against this :-/ Initially I thought that keeping track of when service-disable is called by the RA would help distinguish from when it is called by the cloud operator (or someone else), e.g.: stop () { if ! [ -e $service_disabled_state_file ]; then # "Best effort" attempt to tell nova the service is # unavailable; we cannot rely on this to succeed (since # if the stop is due to a failure, that failure might also # be creating issues with nova-api), so it has to be # asynchronous. # # N.B. At this point the service might have already # been disabled externally, e.g. manually by the cloud # operator for planned maintenance. attempt_nova_service_disable # Optimistically assume the service-disable succeeded, # so that even if something went wrong, we make sure # to do a service-enable when we restart it. touch $service_disabled_state_file fi ensure_nova_compute_stopped } start () { if ! start_nova_compute; then # Start-up failed, so we don't want to reenable. return fi if [ -e $service_disabled_state_file ]; then # In contrast to attempt_nova_service_disable, this # has to be synchronous, since if we can't enable the # service, there's no point running it and pretending # everything's OK. # # FIXME: we don't want to do this if something else # other than the stop() action previously called # service-disable. ensure_nova_service_enabled rm $service_disabled_state_file fi } monitor () { # Here some application-level probe is better than a PID check. } But with further thought, I'm not sure that state file brings any benefit, because it would only optimize away nova-api calls if either start was called several times consecutively, or stop was, and that won't happen. Another problem is that if a stop was rapidly followed by a start (which we can expect with say, migration-threshold=1), the asynchronous attempt_nova_service_disable might not complete until after the synchronous ensure_nova_service_enabled completed, in which case we'd have a running nova-compute which was accidentally disabled. Thoughts? _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org