On 11/10/2016 11:34 AM, Kostiantyn Ponomarenko wrote: > Ulrich Windl, > > >> You want your resources to move to their preferred location after > some problem. > It is not about that. It is about - I want to control when fail-back > happens. And I want to be sure that I have full control over it all > the time. > > Klaus Wenninger, > > You are right. That is exactly what I want and what I am concerned > about. Another example with "move" operation is 100% correct. > > I've been thinking about another possible approach here since > yesterday and I've got an idea which actually seems to satisfy my needs. > At least till a proper solution is available. > My set-up is a two node cluster. > I will modify my script to: > > 1. issue a command to low down "resource-stickiness" on the local > node; > 2. on the other node to trigger a script which waits for cluster > to finish all transactions (crm_resource --wait) and set > "resource-stickiness" back to its original value; > 3. on this node wait for cluster to finish all transactions > (crm_resource --wait) and set "resource-stickiness" back to its > original value; > > This way I can be sure to have back the original value of > "resource-stickiness" immediately after fail-back. > Though, I am still thinking about the best way of how a local script > can trigger the script on the other node and passing an argument to it. > If any thoughts, I would like to hear =) > > > I also was thinking about more general approach to it. > Maybe it is time for higher level cluster configuration tools to > evolve to provide this robustness? > So that they can take a sequence of commands and guarantee that they > will be executed in a predicted order even if a node on which this > sequence was initiated goes down.
yep, either that or - especially for things where the success of your cib-modification is very special to your cluster - you script it. But in either case the high-level-tool or your script can fail, the node it is running on can be fenced or whatever you can think of ... So I wanted to think about simple, not very invasive things that could be done within the core of pacemaker to enable a predictable fallback in such cases. > > Or maybe pacemaker can expand its functionality to handle a command > sequence? > > Or this special tagging which you mentioned. Could you please > elaborate on this one as I am curious how it should work? That is what the high-level-tools are doing at the moment. You can recognize the constraints they have created by their names (prefix). > > >> some mechanism that makes the constraints somehow magically > disappear or disabled when they have achieved what they were intended to. > You mean something like time based constraints, but instead of > duration they are event based? Something in that direction, yes ... > > Thank you, > Kostia > > On Thu, Nov 10, 2016 at 11:17 AM, Klaus Wenninger <kwenn...@redhat.com > <mailto:kwenn...@redhat.com>> wrote: > > On 11/10/2016 08:27 AM, Ulrich Windl wrote: > >>>> Klaus Wenninger <kwenn...@redhat.com > <mailto:kwenn...@redhat.com>> schrieb am 09.11.2016 um 17:42 in > > Nachricht <80c65564-b299-e504-4c6c-afd0ff86e...@redhat.com > <mailto:80c65564-b299-e504-4c6c-afd0ff86e...@redhat.com>>: > >> On 11/09/2016 05:30 PM, Kostiantyn Ponomarenko wrote: > >>> When one problem seems to be solved, another one appears. > >>> Now my script looks this way: > >>> > >>> crm --wait configure rsc_defaults resource-stickiness=50 > >>> crm configure rsc_defaults resource-stickiness=150 > >>> > >>> While now I am sure that transactions caused by the first command > >>> won't be aborted, I see another possible problem here. > >>> With a minimum load in the cluster it took 22 sec for this > script to > >>> finish. > >>> I see here a weakness. > >>> If a node on which this script is called goes down for any > reasons, > >>> then "resource-stickiness" is not set back to its original value, > >>> which is vary bad. > > I don't quite understand: You want your resources to move to > their preferred location after some problem. When the node goes > down with the lower stickiness, there is no problem while the > other node is down; when it comes up, resources might be moved, > but isn't that what you wanted? > > I guess this is about the general problem with features like e.g. > 'move' > as well > that are so much against how pacemaker is working. > They are implemented inside the high-level-tooling. > They are temporarily modifying the CIB and if something happens > that makes > this controlling high-level-tool go away it stays as is - or the CIB > even stays > modified and the user has to know that he has to do a manual cleanup. > So we could actually derive a general discussion from that how to > handle > these issues in a way that it is less likely to have artefacts > persist after > some administrative action. > At the moment e.g. special tagging for the constraints that are > automatically > created to trigger a move is one approach. > But when would you issue an automatized cleanup? Is there anything > implemented in high-level-tooling? pcsd I guess would be a > candidate, for > crmsh I don't know of a persistent instance that could take care > of that ... > > If we say we won't implement these features in the core of pacemaker > I definitely agree. But is there anything we could do to make it > easier > for high-level-tools? > I'm thinking of some mechanism that makes the constraints somehow > magically disappear or disabled when they have achieved what they > were intended to, if the connection to some administrative-shell is > lost, or ... > I could imagine dependency on some token given to a shell, something > like a suicide-timeout, ... > Maybe the usual habit when configuring a switch/router can trigger > some ideas: issue a reboot in x minutes; do a non persistent > config-change; > check if everything is fine afterwards; make it persistent; disable > the timed reboot > > > > >>> So, now I am thinking of how to solve this problem. I would > appreciate > >>> any thoughts about this. > >>> > >>> Is there a way to ask Pacemaker to do these commands > sequentially so > >>> there is no need to wait in the script? > >>> If it is possible, than I think that my concern from above > goes away. > >>> > >>> Another thing which comes to my mind - is to use time based rules. > >>> This ways when I need to do a manual fail-back, I simply set (or > >>> update) a time-based rule from the script. > >>> And the rule will basically say - set "resource-stickiness" to 50 > >>> right now and expire in 10 min. > >>> This looks good at the first glance, but there is no a > reliable way to > >>> put a minimum sufficient time for it; at least not I am aware of. > >>> And the thing is - it is important to me that > "resource-stickiness" is > >>> set back to its original value as soon as possible. > >>> > >>> Those are my thoughts. As I said, I appreciate any ideas here. > >> Have never tried --wait with crmsh but I would guess that the > delay you > >> are observing > >> is really the time your resources are taking to stop and start > somewhere > >> else. > >> > >> Actually you would need the reduced stickiness just during the stop > >> phase - right. > >> > >> So as there is no command like "wait till all stops are done" > you could > >> still > >> do the 'crm_simulate -Ls' and check that it doesn't want to stop > >> anything anymore. > >> So you can save the time the starts would take. > >> Unfortunately you have to repeat that and thus put additional > load on > >> pacemaker > >> possibly slowing down things if your poll-cycle is to short. > >> > >>> > >>> Thank you, > >>> Kostia > >>> > >>> On Tue, Nov 8, 2016 at 10:19 PM, Dejan Muhamedagic > >>> <deja...@fastmail.fm <mailto:deja...@fastmail.fm> > <mailto:deja...@fastmail.fm <mailto:deja...@fastmail.fm>>> wrote: > >>> > >>> On Tue, Nov 08, 2016 at 12:54:10PM +0100, Klaus Wenninger > wrote: > >>> > On 11/08/2016 11:40 AM, Kostiantyn Ponomarenko wrote: > >>> > > Hi, > >>> > > > >>> > > I need a way to do a manual fail-back on demand. > >>> > > To be clear, I don't want it to be ON/OFF; I want it to be > >>> more like > >>> > > "one shot". > >>> > > So far I found that the most reasonable way to do it - > is to set > >>> > > "resource stickiness" to a different value, and then > set it > >>> back to > >>> > > what it was. > >>> > > To do that I created a simple script with two lines: > >>> > > > >>> > > crm configure rsc_defaults resource-stickiness=50 > >>> > > crm configure rsc_defaults resource-stickiness=150 > >>> > > > >>> > > There are no timeouts before setting the original > value back. > >>> > > If I call this script, I get what I want - Pacemaker moves > >>> resources > >>> > > to their preferred locations, and "resource > stickiness" is set > >>> back to > >>> > > its original value. > >>> > > > >>> > > Despite it works, I still have few concerns about this > approach. > >>> > > Will I get the same behavior under a big load with > delays on > >>> systems > >>> > > in cluster (which is truly possible and a normal case > in my > >>> environment)? > >>> > > How Pacemaker treats fast change of this parameter? > >>> > > I am worried that if "resource stickiness" is set back > to its > >>> original > >>> > > value to fast, then no fail-back will happen. Is it > possible, or I > >>> > > shouldn't worry about it? > >>> > > >>> > AFAIK pengine is interrupted when calculating a more > complicated > >>> transition > >>> > and if the situation has changed a transition that is > just being > >>> executed > >>> > is aborted if the input from pengine changed. > >>> > So I would definitely worry! > >>> > What you could do is to issue 'crm_simulate -Ls' in > between and > >>> grep for > >>> > an empty transition. > >>> > There might be more elegant ways but that should be safe. > >>> > >>> crmsh has an option (-w) to wait for the PE to settle after > >>> committing configuration changes. > >>> > >>> Thanks, > >>> > >>> Dejan > >>> > > >>> > > Thank you, > >>> > > Kostia > >>> > > > >>> > > > >>> > > _______________________________________________ > >>> > > Users mailing list: Users@clusterlabs.org > <mailto:Users@clusterlabs.org> > >>> <mailto:Users@clusterlabs.org <mailto:Users@clusterlabs.org>> > >>> > > http://clusterlabs.org/mailman/listinfo/users > <http://clusterlabs.org/mailman/listinfo/users> > >>> <http://clusterlabs.org/mailman/listinfo/users > <http://clusterlabs.org/mailman/listinfo/users>> > >>> > > > >>> > > Project Home: http://www.clusterlabs.org > >>> > > Getting started: > >>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf> > >>> <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>> > >>> > > Bugs: http://bugs.clusterlabs.org > >>> > > >>> > > >>> > > >>> > _______________________________________________ > >>> > Users mailing list: Users@clusterlabs.org > <mailto:Users@clusterlabs.org> > >>> <mailto:Users@clusterlabs.org <mailto:Users@clusterlabs.org>> > >>> > http://clusterlabs.org/mailman/listinfo/users > <http://clusterlabs.org/mailman/listinfo/users> > >>> <http://clusterlabs.org/mailman/listinfo/users > <http://clusterlabs.org/mailman/listinfo/users>> > >>> > > >>> > Project Home: http://www.clusterlabs.org > >>> > Getting started: > >>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf> > >>> <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>> > >>> > Bugs: http://bugs.clusterlabs.org > >>> > >>> _______________________________________________ > >>> Users mailing list: Users@clusterlabs.org > <mailto:Users@clusterlabs.org> > >>> <mailto:Users@clusterlabs.org <mailto:Users@clusterlabs.org>> > >>> http://clusterlabs.org/mailman/listinfo/users > <http://clusterlabs.org/mailman/listinfo/users> > >>> <http://clusterlabs.org/mailman/listinfo/users > <http://clusterlabs.org/mailman/listinfo/users>> > >>> > >>> Project Home: http://www.clusterlabs.org > >>> Getting started: > >>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf> > >>> <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>> > >>> Bugs: http://bugs.clusterlabs.org > >>> > >>> > >> > >> _______________________________________________ > >> Users mailing list: Users@clusterlabs.org > <mailto:Users@clusterlabs.org> > >> http://clusterlabs.org/mailman/listinfo/users > <http://clusterlabs.org/mailman/listinfo/users> > >> > >> Project Home: http://www.clusterlabs.org > >> Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf> > >> Bugs: http://bugs.clusterlabs.org > > > > > > > > _______________________________________________ > > Users mailing list: Users@clusterlabs.org > <mailto:Users@clusterlabs.org> > > http://clusterlabs.org/mailman/listinfo/users > <http://clusterlabs.org/mailman/listinfo/users> > > > > Project Home: http://www.clusterlabs.org > > Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf> > > Bugs: http://bugs.clusterlabs.org > > > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > <mailto:Users@clusterlabs.org> > http://clusterlabs.org/mailman/listinfo/users > <http://clusterlabs.org/mailman/listinfo/users> > > Project Home: http://www.clusterlabs.org > Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf> > Bugs: http://bugs.clusterlabs.org > > _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org