Re: [ClusterLabs] Antw: Re: How Pacemaker reacts to fast changes of the same parameter in configuration

Klaus Wenninger Thu, 10 Nov 2016 01:20:58 -0800

On 11/10/2016 08:27 AM, Ulrich Windl wrote:
>>>> Klaus Wenninger <kwenn...@redhat.com> schrieb am 09.11.2016 um 17:42 in
> Nachricht <80c65564-b299-e504-4c6c-afd0ff86e...@redhat.com>:
>> On 11/09/2016 05:30 PM, Kostiantyn Ponomarenko wrote:
>>> When one problem seems to be solved, another one appears.
>>> Now my script looks this way:
>>>
>>>     crm --wait configure rsc_defaults resource-stickiness=50
>>>     crm configure rsc_defaults resource-stickiness=150
>>>
>>> While now I am sure that transactions caused by the first command
>>> won't be aborted, I see another possible problem here.
>>> With a minimum load in the cluster it took 22 sec for this script to
>>> finish. 
>>> I see here a weakness. 
>>> If a node on which this script is called goes down for any reasons,
>>> then "resource-stickiness" is not set back to its original value,
>>> which is vary bad.
> I don't quite understand: You want your resources to move to their preferred 
> location after some problem. When the node goes down with the lower 
> stickiness, there is no problem while the other node is down; when it comes 
> up, resources might be moved, but isn't that what you wanted?


I guess this is about the general problem with features like e.g. 'move'
as well
that are so much against how pacemaker is working.
They are implemented inside the high-level-tooling.
They are temporarily modifying the CIB and if something happens that makes
this controlling high-level-tool go away it stays as is - or the CIB
even stays
modified and the user has to know that he has to do a manual cleanup.
So we could actually derive a general discussion from that how to handle
these issues in a way that it is less likely to have artefacts persist after
some administrative action.
At the moment e.g. special tagging for the constraints that are
automatically
created to trigger a move  is one approach.
But when would you issue an automatized cleanup? Is there anything
implemented in high-level-tooling? pcsd I guess would be a candidate, for
crmsh I don't know of a persistent instance that could take care of that ...

If we say we won't implement these features in the core of pacemaker
I definitely agree. But is there anything we could do to make it easier
for high-level-tools?
I'm thinking of some mechanism that makes the constraints somehow
magically disappear or disabled when they have achieved what they
were intended to, if the connection to some administrative-shell is
lost, or ...
I could imagine dependency on some token given to a shell, something
like a suicide-timeout, ...
Maybe the usual habit when configuring a switch/router can trigger
some ideas: issue a reboot in x minutes; do a non persistent config-change;
check if everything is fine afterwards; make it persistent; disable
the timed reboot
 
>
>>> So, now I am thinking of how to solve this problem. I would appreciate
>>> any thoughts about this.
>>>
>>> Is there a way to ask Pacemaker to do these commands sequentially so
>>> there is no need to wait in the script?
>>> If it is possible, than I think that my concern from above goes away.
>>>
>>> Another thing which comes to my mind - is to use time based rules.
>>> This ways when I need to do a manual fail-back, I simply set (or
>>> update) a time-based rule from the script.
>>> And the rule will basically say - set "resource-stickiness" to 50
>>> right now and expire in 10 min.
>>> This looks good at the first glance, but there is no a reliable way to
>>> put a minimum sufficient time for it; at least not I am aware of.
>>> And the thing is - it is important to me that "resource-stickiness" is
>>> set back to its original value as soon as possible.
>>>
>>> Those are my thoughts. As I said, I appreciate any ideas here.
>> Have never tried --wait with crmsh but I would guess that the delay you
>> are observing
>> is really the time your resources are taking to stop and start somewhere
>> else.
>>
>> Actually you would need the reduced stickiness just during the stop
>> phase - right.
>>
>> So as there is no command like "wait till all stops are done" you could
>> still
>> do the 'crm_simulate -Ls' and check that it doesn't want to stop
>> anything anymore.
>> So you can save the time the starts would take.
>> Unfortunately you have to repeat that and thus put additional load on
>> pacemaker
>> possibly slowing down things if your poll-cycle is to short.
>>
>>>
>>> Thank you,
>>> Kostia
>>>
>>> On Tue, Nov 8, 2016 at 10:19 PM, Dejan Muhamedagic
>>> <deja...@fastmail.fm <mailto:deja...@fastmail.fm>> wrote:
>>>
>>>     On Tue, Nov 08, 2016 at 12:54:10PM +0100, Klaus Wenninger wrote:
>>>     > On 11/08/2016 11:40 AM, Kostiantyn Ponomarenko wrote:
>>>     > > Hi,
>>>     > >
>>>     > > I need a way to do a manual fail-back on demand.
>>>     > > To be clear, I don't want it to be ON/OFF; I want it to be
>>>     more like
>>>     > > "one shot".
>>>     > > So far I found that the most reasonable way to do it - is to set
>>>     > > "resource stickiness" to a different value, and then set it
>>>     back to
>>>     > > what it was.
>>>     > > To do that I created a simple script with two lines:
>>>     > >
>>>     > >     crm configure rsc_defaults resource-stickiness=50
>>>     > >     crm configure rsc_defaults resource-stickiness=150
>>>     > >
>>>     > > There are no timeouts before setting the original value back.
>>>     > > If I call this script, I get what I want - Pacemaker moves
>>>     resources
>>>     > > to their preferred locations, and "resource stickiness" is set
>>>     back to
>>>     > > its original value.
>>>     > >
>>>     > > Despite it works, I still have few concerns about this approach.
>>>     > > Will I get the same behavior under a big load with delays on
>>>     systems
>>>     > > in cluster (which is truly possible and a normal case in my
>>>     environment)?
>>>     > > How Pacemaker treats fast change of this parameter?
>>>     > > I am worried that if "resource stickiness" is set back to its
>>>     original
>>>     > > value to fast, then no fail-back will happen. Is it possible, or I
>>>     > > shouldn't worry about it?
>>>     >
>>>     > AFAIK pengine is interrupted when calculating a more complicated
>>>     transition
>>>     > and if the situation has changed a transition that is just being
>>>     executed
>>>     > is aborted if the input from pengine changed.
>>>     > So I would definitely worry!
>>>     > What you could do is to issue 'crm_simulate -Ls' in between and
>>>     grep for
>>>     > an empty transition.
>>>     > There might be more elegant ways but that should be safe.
>>>
>>>     crmsh has an option (-w) to wait for the PE to settle after
>>>     committing configuration changes.
>>>
>>>     Thanks,
>>>
>>>     Dejan
>>>     >
>>>     > > Thank you,
>>>     > > Kostia
>>>     > >
>>>     > >
>>>     > > _______________________________________________
>>>     > > Users mailing list: Users@clusterlabs.org 
>>>     <mailto:Users@clusterlabs.org>
>>>     > > http://clusterlabs.org/mailman/listinfo/users 
>>>     <http://clusterlabs.org/mailman/listinfo/users>
>>>     > >
>>>     > > Project Home: http://www.clusterlabs.org 
>>>     > > Getting started:
>>>     http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>>     <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
>>>     > > Bugs: http://bugs.clusterlabs.org 
>>>     >
>>>     >
>>>     >
>>>     > _______________________________________________
>>>     > Users mailing list: Users@clusterlabs.org 
>>>     <mailto:Users@clusterlabs.org>
>>>     > http://clusterlabs.org/mailman/listinfo/users 
>>>     <http://clusterlabs.org/mailman/listinfo/users>
>>>     >
>>>     > Project Home: http://www.clusterlabs.org 
>>>     > Getting started:
>>>     http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>>     <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
>>>     > Bugs: http://bugs.clusterlabs.org 
>>>
>>>     _______________________________________________
>>>     Users mailing list: Users@clusterlabs.org 
>>>     <mailto:Users@clusterlabs.org>
>>>     http://clusterlabs.org/mailman/listinfo/users 
>>>     <http://clusterlabs.org/mailman/listinfo/users>
>>>
>>>     Project Home: http://www.clusterlabs.org 
>>>     Getting started:
>>>     http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>>     <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
>>>     Bugs: http://bugs.clusterlabs.org 
>>>
>>>
>>
>> _______________________________________________
>> Users mailing list: Users@clusterlabs.org 
>> http://clusterlabs.org/mailman/listinfo/users 
>>
>> Project Home: http://www.clusterlabs.org 
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>> Bugs: http://bugs.clusterlabs.org 
>
>
>
> _______________________________________________
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



_______________________________________________
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: Re: How Pacemaker reacts to fast changes of the same parameter in configuration

Reply via email to