On 11/10/2016 08:27 AM, Ulrich Windl wrote: >>>> Klaus Wenninger <kwenn...@redhat.com> schrieb am 09.11.2016 um 17:42 in > Nachricht <80c65564-b299-e504-4c6c-afd0ff86e...@redhat.com>: >> On 11/09/2016 05:30 PM, Kostiantyn Ponomarenko wrote: >>> When one problem seems to be solved, another one appears. >>> Now my script looks this way: >>> >>> crm --wait configure rsc_defaults resource-stickiness=50 >>> crm configure rsc_defaults resource-stickiness=150 >>> >>> While now I am sure that transactions caused by the first command >>> won't be aborted, I see another possible problem here. >>> With a minimum load in the cluster it took 22 sec for this script to >>> finish. >>> I see here a weakness. >>> If a node on which this script is called goes down for any reasons, >>> then "resource-stickiness" is not set back to its original value, >>> which is vary bad. > I don't quite understand: You want your resources to move to their preferred > location after some problem. When the node goes down with the lower > stickiness, there is no problem while the other node is down; when it comes > up, resources might be moved, but isn't that what you wanted?
I guess this is about the general problem with features like e.g. 'move' as well that are so much against how pacemaker is working. They are implemented inside the high-level-tooling. They are temporarily modifying the CIB and if something happens that makes this controlling high-level-tool go away it stays as is - or the CIB even stays modified and the user has to know that he has to do a manual cleanup. So we could actually derive a general discussion from that how to handle these issues in a way that it is less likely to have artefacts persist after some administrative action. At the moment e.g. special tagging for the constraints that are automatically created to trigger a move is one approach. But when would you issue an automatized cleanup? Is there anything implemented in high-level-tooling? pcsd I guess would be a candidate, for crmsh I don't know of a persistent instance that could take care of that ... If we say we won't implement these features in the core of pacemaker I definitely agree. But is there anything we could do to make it easier for high-level-tools? I'm thinking of some mechanism that makes the constraints somehow magically disappear or disabled when they have achieved what they were intended to, if the connection to some administrative-shell is lost, or ... I could imagine dependency on some token given to a shell, something like a suicide-timeout, ... Maybe the usual habit when configuring a switch/router can trigger some ideas: issue a reboot in x minutes; do a non persistent config-change; check if everything is fine afterwards; make it persistent; disable the timed reboot > >>> So, now I am thinking of how to solve this problem. I would appreciate >>> any thoughts about this. >>> >>> Is there a way to ask Pacemaker to do these commands sequentially so >>> there is no need to wait in the script? >>> If it is possible, than I think that my concern from above goes away. >>> >>> Another thing which comes to my mind - is to use time based rules. >>> This ways when I need to do a manual fail-back, I simply set (or >>> update) a time-based rule from the script. >>> And the rule will basically say - set "resource-stickiness" to 50 >>> right now and expire in 10 min. >>> This looks good at the first glance, but there is no a reliable way to >>> put a minimum sufficient time for it; at least not I am aware of. >>> And the thing is - it is important to me that "resource-stickiness" is >>> set back to its original value as soon as possible. >>> >>> Those are my thoughts. As I said, I appreciate any ideas here. >> Have never tried --wait with crmsh but I would guess that the delay you >> are observing >> is really the time your resources are taking to stop and start somewhere >> else. >> >> Actually you would need the reduced stickiness just during the stop >> phase - right. >> >> So as there is no command like "wait till all stops are done" you could >> still >> do the 'crm_simulate -Ls' and check that it doesn't want to stop >> anything anymore. >> So you can save the time the starts would take. >> Unfortunately you have to repeat that and thus put additional load on >> pacemaker >> possibly slowing down things if your poll-cycle is to short. >> >>> >>> Thank you, >>> Kostia >>> >>> On Tue, Nov 8, 2016 at 10:19 PM, Dejan Muhamedagic >>> <deja...@fastmail.fm <mailto:deja...@fastmail.fm>> wrote: >>> >>> On Tue, Nov 08, 2016 at 12:54:10PM +0100, Klaus Wenninger wrote: >>> > On 11/08/2016 11:40 AM, Kostiantyn Ponomarenko wrote: >>> > > Hi, >>> > > >>> > > I need a way to do a manual fail-back on demand. >>> > > To be clear, I don't want it to be ON/OFF; I want it to be >>> more like >>> > > "one shot". >>> > > So far I found that the most reasonable way to do it - is to set >>> > > "resource stickiness" to a different value, and then set it >>> back to >>> > > what it was. >>> > > To do that I created a simple script with two lines: >>> > > >>> > > crm configure rsc_defaults resource-stickiness=50 >>> > > crm configure rsc_defaults resource-stickiness=150 >>> > > >>> > > There are no timeouts before setting the original value back. >>> > > If I call this script, I get what I want - Pacemaker moves >>> resources >>> > > to their preferred locations, and "resource stickiness" is set >>> back to >>> > > its original value. >>> > > >>> > > Despite it works, I still have few concerns about this approach. >>> > > Will I get the same behavior under a big load with delays on >>> systems >>> > > in cluster (which is truly possible and a normal case in my >>> environment)? >>> > > How Pacemaker treats fast change of this parameter? >>> > > I am worried that if "resource stickiness" is set back to its >>> original >>> > > value to fast, then no fail-back will happen. Is it possible, or I >>> > > shouldn't worry about it? >>> > >>> > AFAIK pengine is interrupted when calculating a more complicated >>> transition >>> > and if the situation has changed a transition that is just being >>> executed >>> > is aborted if the input from pengine changed. >>> > So I would definitely worry! >>> > What you could do is to issue 'crm_simulate -Ls' in between and >>> grep for >>> > an empty transition. >>> > There might be more elegant ways but that should be safe. >>> >>> crmsh has an option (-w) to wait for the PE to settle after >>> committing configuration changes. >>> >>> Thanks, >>> >>> Dejan >>> > >>> > > Thank you, >>> > > Kostia >>> > > >>> > > >>> > > _______________________________________________ >>> > > Users mailing list: Users@clusterlabs.org >>> <mailto:Users@clusterlabs.org> >>> > > http://clusterlabs.org/mailman/listinfo/users >>> <http://clusterlabs.org/mailman/listinfo/users> >>> > > >>> > > Project Home: http://www.clusterlabs.org >>> > > Getting started: >>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf> >>> > > Bugs: http://bugs.clusterlabs.org >>> > >>> > >>> > >>> > _______________________________________________ >>> > Users mailing list: Users@clusterlabs.org >>> <mailto:Users@clusterlabs.org> >>> > http://clusterlabs.org/mailman/listinfo/users >>> <http://clusterlabs.org/mailman/listinfo/users> >>> > >>> > Project Home: http://www.clusterlabs.org >>> > Getting started: >>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf> >>> > Bugs: http://bugs.clusterlabs.org >>> >>> _______________________________________________ >>> Users mailing list: Users@clusterlabs.org >>> <mailto:Users@clusterlabs.org> >>> http://clusterlabs.org/mailman/listinfo/users >>> <http://clusterlabs.org/mailman/listinfo/users> >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: >>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf> >>> Bugs: http://bugs.clusterlabs.org >>> >>> >> >> _______________________________________________ >> Users mailing list: Users@clusterlabs.org >> http://clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org