On Thu, 2020-02-27 at 15:01 +0100, Jehan-Guillaume de Rorthais wrote: > On Thu, 27 Feb 2020 12:24:46 +0100 > "Ulrich Windl" <ulrich.wi...@rz.uni-regensburg.de> wrote: > > > > > > Jehan-Guillaume de Rorthais <j...@dalibo.com> schrieb am > > > > > 27.02.2020 um > > > > 11:05 in > > Nachricht <20200227110502.3624cb87@firost>: > > > > [...] > > > What about something like "lock‑location=bool" and > > > > For "lock-location" I would assume the value is a "location". I > > guess you > > wanted a "use-lock-location" Boolean value. > > Mh, maybe "lock-current-location" would better reflect what I meant. > > The point is to lock the resource on the node currently running it.
Though it only applies for a clean node shutdown, so that has to be in the name somewhere. The resource isn't locked during normal cluster operation (it can move for resource or node failures, load rebalancing, etc.). > > > "lock‑location‑timeout=duration" (for those who like automatic > > > steps)? I > > > imagine > > > > I'm still unhappy with "lock-location": What is a "location", and > > what does it > > mean to be "locked"? > > Is that fundamentally different from "freeze/frozen" or "ignore" > > (all those > > phrases exist already)? > > A "location" define where a resource is located in the cluster, on > what node. > Eg., a location constraint express where a ressource //can// run: > > «Location constraints tell the cluster which nodes a resource can > run on. » > > https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html/Pacemaker_Explained/_deciding_which_nodes_a_resource_can_run_on.html > > Here, "constraints" applies to a location. So, if you remove this > constraint, > the natural definition location would be: > > «Location tell the cluster what node a resource is running on.» > > > > it would lock the resource location (unique or clones) until the > > > operator > > > unlock it or the "lock‑location‑timeout" expire. No matter what > > > happen to > > > the resource, maintenance mode or not. > > > > > > At a first look, it looks to peer nicely with maintenance‑mode > > > and avoid > > > resource migration after node reboot. Maintenance mode is useful if you're updating the cluster stack itself -- put in maintenance mode, stop the cluster services (leaving the managed services still running), update the cluster services, start the cluster services again, take out of maintenance mode. This is useful if you're rebooting the node for a kernel update (for example). Apply the update, reboot the node. The cluster takes care of everything else for you (stop the services before shutting down and do not recover them until the node comes back). > > I wonder: Where is it different from a time-limited "ban" (wording > > also exists > > already)? If you ban all resources from running on a specific node, > > resources > > would be move away, and when booting the node, resources won't come > > back. It actually is equivalent to this process: 1. Determine what resources are active on the node about to be shut down. 2. For each of those resources, configure a ban (location constraint with -INFINITY score) using a rule where node name is not the node being shut down. 3. Apply the updates and reboot the node. The cluster will stop the resources (due to shutdown) and not start them anywhere else (due to the bans). 4. Wait for the node to rejoin and the resources to start on it again, then remove all the bans. The advantage is automation, and in particular the sysadmin applying the updates doesn't need to even know that the host is part of a cluster. > This is the standby mode. Standby mode will stop all resources on a node, but it doesn't prevent recovery elsewhere. > Moreover, note that Ken explicitly wrote: «The cluster runs services > that have > a preferred node». So if the resource moved elsewhere, the resource > **must** > come back. Right, the point of preventing recovery elsewhere is to avoid the extra outage: Without shutdown lock: 1. When node is stopped, resource stops on that node, and starts on another node. (First outage) 2. When node rejoins, resource stops on the alternate node, and starts on original node. (Second outage) With shutdown lock, there's one outage when the node is rebooted, but then it starts on the same node so there is no second outage. If the resource start time is much longer (e.g. a half hour for an extremely large database) than the reboot time (a couple of minutes), the feature becomes worthwhile. > > But you want the resources to be down while the node boots, right? > > How can > > that concept be "married with" the concept of high availablility? > > The point here is to avoid moving resources during planed > maintenance/downtime > as it would require longer maintenance duration (thus longer > downtime) than a > simple reboot with no resource migration. > > Even a resource in HA can have planed maintenance :) Right. I jokingly call this feature "medium availability" but really it is just another way to set a planned maintenance window. > > "We have a HA cluster and HA resources, but when we boot a node > > those > > HA-resources will be down while the node boots." How is that > > different from > > not having a HA cluster, or taking those resources temporarily away > > from the > > HA cluster? (That was my intitial objection: Why not simply ignore > > resource > > failures for some time?) HA recovery is still done for resource failures and node failures, just not clean node shutdowns. A clean node shutdown is one where the node notifies the DC that it wants to leave the cluster (which is what happens in the background when you stop cluster services on a node). Also, all other cluster resource management features being used, like utilization attributes, placement strategies, node health attributes, time-based rules, etc., are all still in effect. > Unless I'm wrong, maintenance mode does not secure the current > location of > resources after reboots. -- Ken Gaillot <kgail...@redhat.com> _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/