On Wed, 2020-02-26 at 10:33 +0100, Ulrich Windl wrote: > > > > Ken Gaillot <kgail...@redhat.com> schrieb am 25.02.2020 um > > > > 23:30 in > > Nachricht > <29058_1582669837_5E55A00B_29058_3341_1_f8e8426d0c2cf098f88fb6330e8a8 > 0586f03043a > ca...@redhat.com>: > > Hi all, > > > > We are a couple of months away from starting the release cycle for > > Pacemaker 2.0.4. I'll highlight some new features between now and > > then. > > > > First we have shutdown locks. This is a narrow use case that I > > don't > > expect a lot of interest in, but it helps give pacemaker feature > > parity > > with proprietary HA systems, which can help users feel more > > comfortable > > switching to pacemaker and open source. > > > > The use case is a large organization with few cluster experts and > > many > > junior system administrators who reboot hosts for OS updates during > > planned maintenance windows, without any knowledge of what the host > > does. The cluster runs services that have a preferred node and take > > a > > very long time to start. > > > > In this scenario, pacemaker's default behavior of moving the > > service to > > a failover node when the node shuts down, and moving it back when > > the > > node comes back up, results in needless downtime compared to just > > leaving the service down for the few minutes needed for a reboot. > > > > The goal could be accomplished with existing pacemaker features. > > Maintenance mode wouldn't work because the node is being rebooted. > > But > > you could figure out what resources are active on the node, and use > > a > > location constraint with a rule to ban them on all other nodes > > before > > shutting down. That's a lot of work for something the cluster can > > figure out automatically. > > > > Pacemaker 2.0.4 will offer a new cluster property, shutdown‑lock, > > defaulting to false to keep the current behavior. If shutdown‑lock > > is > > set to true, any resources active on a node when it is cleanly shut > > down will be "locked" to the node (kept down rather than recovered > > elsewhere). Once the node comes back up and rejoins the cluster, > > they > > will be "unlocked" (free to move again if circumstances warrant). > > I'm not very happy with the wording: What about a per-resource > feature > "tolerate-downtime" that specifies how long this resource may be down > without > causing actions from the cluster. I think it would be more useful > than some > global setting. Maybe complement that per-resource feature with a > per-node > feature using the same name.
I considered a per-resource and/or per-node setting, but the target audience is someone who wants things as simple as possible. A per-node setting would mean that newly added nodes don't have it by default, which could be easily overlooked. (As an aside, I would someday like to see a "node defaults" section that would provide default values for node attributes. That could potentially replace several current cluster-wide options. But it's a low priority.) I didn't mention this in the announcements, but certain resource types are excluded: Stonith resources and Pacemaker Remote connection resources are never locked. That makes sense because they are more a sort of internal pseudo-resource than an actual end-user service. Stonith resources are just monitors of the fence device, and a connection resource starts a (remote) node rather than a service. Also, with the current implementation, clone and bundle instances are not locked. This would only matter for unique clones, and clones/bundles with clone-max/replicas set below the total number of nodes. If this becomes a high demand, we could add it in the future. Similarly for the master role of promotable clones. Given those limitations, I think a per-resource option would have more potential to be confusing than helpful. But, it should be relatively simple to extend this as a per-resource option, with the global option as a backward-compatible default, if the demand arises. > I think it's very important to specify and document that mode > comparing it to > maintenance mode. The proposed documentation is in the master branch if you want to proof it and make suggestions. If you have the prerequisites installed you can run "make -C doc" and view it locally, otherwise you can browse the source (search for "shutdown-lock"): https://github.com/ClusterLabs/pacemaker/blob/master/doc/Pacemaker_Explained/en-US/Ch-Options.txt There is currently no explicit comparison with maintenance-mode because maintenance-mode still behaves according to its documention ("Should the cluster refrain from monitoring, starting and stopping resources?"). However I can see the value in adding a section somewhere (probably in "Pacemaker Administration") comparing all the various "don't touch" settings -- maintenance-mode, maintenance node/resource attributes, standby, is-managed, shutdown-lock, and the monitor enable option. The current "Monitoring Resources When Administration is Disabled" section in Pacemaker Explained could be a good starting point for this. Another item for the to-do list ... > Regards, > Ulrich > > > > > An additional cluster property, shutdown‑lock‑limit, allows you to > > set > > a timeout for the locks so that if the node doesn't come back > > within > > that time, the resources are free to be recovered elsewhere. This > > defaults to no limit. > > > > If you decide while the node is down that you need the resource to > > be > > recovered, you can manually clear a lock with "crm_resource > > ‑‑refresh" > > specifying both ‑‑node and ‑‑resource. > > > > There are some limitations using shutdown locks with Pacemaker > > Remote > > nodes, so I'd avoid that with the upcoming release, though it is > > possible. > > ‑‑ > > Ken Gaillot <kgail...@redhat.com> > > > > _______________________________________________ > > Manage your subscription: > > https://lists.clusterlabs.org/mailman/listinfo/users > > > > ClusterLabs home: https://www.clusterlabs.org/ > > > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ -- Ken Gaillot <kgail...@redhat.com> _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/