On Thu, Jul 21, 2016 at 2:47 AM, Adam Spiers <aspi...@suse.com> wrote: > Ken Gaillot <kgail...@redhat.com> wrote: >> Hello all, >> >> I've been meaning to address the implementation of "reload" in Pacemaker >> for a while now, and I think the next release will be a good time, as it >> seems to be coming up more frequently. > > [snipped] > > I don't want to comment directly on any of the excellent points which > have been raised in this thread, but it seems like a good time to make > a plea for easier reload / restart of individual instances of cloned > services, one node at a time. Currently, if nodes are all managed by > a configuration management system (such as Chef in our case),
Puppet creates the same kinds of issues. Both seem designed for a magical world full of unrelated servers that require no co-ordination to update. Particularly when the timing of an update to some central store (cib, database, whatever) needs to be carefully ordered. When you say "restart" though, is that a traditional stop/start cycle in Pacemaker that also results in all the dependancies being stopped too? I'm guessing you really want the "atomic reload" kind where nothing else is affected because we already have the other style covered by crm_resource --restart. I propose that we introduce a --force-restart option for crm_resource which: 1. disables any recurring monitor operations 2. calls a native restart action directly on the resource if it exists, otherwise calls the native stop+start actions 3. re-enables the recurring monitor operations regardless of whether the reload succeeds, fails, or times out, etc No maintenance mode required, and whatever state the resource ends up in is re-detected by the cluster in step 3. > when the > system wants to perform a configuration run on that node (e.g. when > updating a service's configuration file from a template), it is > necessary to place the entire node in maintenance mode before > reloading or restarting that service on that node. It works OK, but > can result in ugly effects such as the node getting stuck in > maintenance mode if the chef-client run failed, without any easy way > to track down the original cause. > > I went through several design iterations before settling on this > approach, and they are detailed in a lengthy comment here, which may > help you better understand the challenges we encountered: > > > https://github.com/crowbar/crowbar-ha/blob/master/chef/cookbooks/crowbar-pacemaker/providers/service.rb#L61 > > Similar challenges are posed during upgrade of Pacemaker-managed > OpenStack infrastructure. > > Cheers, > Adam > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org