[ClusterLabs] Antw: Re: Establishing Timeouts

Ulrich Windl Mon, 10 Oct 2016 23:34:24 -0700

>>> Klaus Wenninger <kwenn...@redhat.com> schrieb am 10.10.2016 um 20:04 in
Nachricht <936e4d4b-df5c-246d-4552-5678653b3...@redhat.com>:
> On 10/10/2016 06:58 PM, Eric Robinson wrote:
>> Thanks for the clarification. So what's the easiest way to ensure that the 
> cluster waits a desired timeout before deciding that a re-convergence is 
> necessary? 
> 
> By raising the token (lost) timeout I would say.


Somewhat off-topic:
I had always wished there were a kind of spreadsheet where you could play with 
those parameters, and together with required constraints you would be informed 
what consequences changing one parameter has. The interdependencies seem quite 
complex, some restictions seem hard, others soft, and some defaults seem to 
result from "black magic". Default values are not always documented, also.

Why does "a new configuration takes  about  50  milliseconds"? Where do they 
come from?

"It is not recommended to alter this value without guidance  from  the  
corosync community." (token_retransmit being 238ms)

(Just two examples)

Some defaults could need explanation, e.g. why exactly 4 retransmits and not 2 
or 3? Is the protocol expected to have a high loss rate?

Regards,
Ulrich

> 
> Please correct my (Chrissie) but I see the
> token (lost) timout somehow as resilience against
> static delays + jitter on top and the
> token_retransmits_before_loss_const as resilience
> against packet-loss.
> 
>>
>> --
>> Eric Robinson
>>    
>>
>> -----Original Message-----
>> From: Christine Caulfield [mailto:ccaul...@redhat.com] 
>> Sent: Monday, October 10, 2016 4:34 AM
>> To: users@clusterlabs.org 
>> Subject: Re: [ClusterLabs] Establishing Timeouts
>>
>> On 10/10/16 05:51, Eric Robinson wrote:
>>> I have about a dozen corosync+pacemaker clusters and I am just now getting 
> around to understanding timeouts.
>>>
>>> Most of my corosync.conf files look something like this:
>>>
>>>         version:        2
>>>         token:          5000
>>>         token_retransmits_before_loss_const: 10
>>>         join:           1000
>>>         consensus:      7500
>>>         vsftype:        none
>>>         max_messages:   20
>>>         secauth:        off
>>>         threads:        0
>>>         clear_node_high_bit: yes
>>>         rrp_mode: active
>>>
>>> If I understand this correctly, this means the node will wait 50 seconds 
> (5000ms x 10) before deciding that a cluster reconfig is necessary (perhaps 
> after a link failure). Is that correct?
>>>
>> No that's not correct. the token timeout is 5 seconds in your example - 
> because token is 5000mS. the token timeout is always what the value of 
> totem.token is.
>>
>> token_retransmits_before_loss_const affects the token hold timeout - which 
>> is 
> how long the token is held on a node that has no messages to send before 
> being forwarded on. So increasing token_retransmits_before_loss_const changes 
> the number of times per 'token' timeout that the token is actually sent.
>>
>> In the example above you will see that the token is sent approximately
>> 5000/10 = 500 mS. That's approximate, the value is scaled slightly to make 
> actual timeouts less likely, and also is affected by messages that may beed 
> to be sent.
>>
>> Chrissie
>>
>>> I'm trying to understand how this works together with my bonded NIC's 
> arp_interval settings. I normally set arp_interval=1000. My question is, how 
> many arp losses are required before the bonding driver decides to failover to 
> the other link? If arp_interval=1000, how many times does the driver send an 
> arp and fail to receive a reply before it decides that the link is dead?
>>>
>>> I think I need to know this so I can set my corosync.conf settings 
>>> correctly 
> to avoid "false positive" cluster failovers. In other words, if there is a 
> link or switch failure, I want to make sure that the cluster allows plenty of 
> time for link communication to recover before deciding that a node has 
> actually died. 
>>>
>>> --
>>> Eric Robinson
>>>
>>>
>>> _______________________________________________
>>> Users mailing list: Users@clusterlabs.org 
>>> http://clusterlabs.org/mailman/listinfo/users 
>>>
>>> Project Home: http://www.clusterlabs.org Getting started: 
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>>> Bugs: http://bugs.clusterlabs.org 
>>>
>>
>> _______________________________________________
>> Users mailing list: Users@clusterlabs.org 
> http://clusterlabs.org/mailman/listinfo/users 
>>
>> Project Home: http://www.clusterlabs.org Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>> Bugs: http://bugs.clusterlabs.org 
>>
>> _______________________________________________
>> Users mailing list: Users@clusterlabs.org 
>> http://clusterlabs.org/mailman/listinfo/users 
>>
>> Project Home: http://www.clusterlabs.org 
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>> Bugs: http://bugs.clusterlabs.org 
> 
> 
> 
> _______________________________________________
> Users mailing list: Users@clusterlabs.org 
> http://clusterlabs.org/mailman/listinfo/users 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> Bugs: http://bugs.clusterlabs.org 




_______________________________________________
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] Antw: Re: Establishing Timeouts

Reply via email to