On Fri, Jun 24, 2016 at 1:26 AM, Adam Spiers <aspi...@suse.com> wrote: > Adam Spiers <aspi...@suse.com> wrote: >> As per the FIXME, one remaining problem is dealing with this kind of >> scenario: >> >> - Cloud operator notices SMART warnings on the compute node >> which is not yet causing hard failures but signifies that the >> hard disk might die soon. >> >> - Operator manually runs "nova service-disable" with the intention >> of doing some maintenance soon, i.e. live-migrating instances away >> and replacing the dying hard disk. >> >> - Before the operator gracefully shuts down nova-compute, an I/O >> error from the disk causes nova-compute to fail. >> >> - Pacemaker invokes the monitor action which spots the failure. >> >> - Pacemaker invokes the stop action which runs service-disable. >> >> - Pacemaker attempts to restart nova-compute by invoking the start >> action. Since the disk failure is currently intermittent, we >> get (un)lucky and nova-compute starts fine. >> >> Then it calls service-enable - BAD! This is now overriding the >> cloud operator's manual request for the service to be disabled. >> If we're really unlucky, nova-scheduler will now start up new VMs >> on the node, even though the hard disk is dying. >> >> However I can't see a way to defend against this :-/ > > OK, I think I figured this out. The answer is not to use > service-disable at all, but to use force_down in the same way we > already use it during fencing. This means we don't mess with the > intentions of the cloud operator which were manually specified via > service-disable. > > I asked on #openstack-nova and got confirmation that this made sense. > Hooray! Dare I suggest we are finally coming close to a consensus?
I'm sure we can find more to argue over if we put our minds to it :-) _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org