Re: [ClusterLabs] temporary loss of quorum when member starts to rejoin

Sherrard Burton Tue, 07 Apr 2020 06:33:26 -0700



On 4/7/20 8:40 AM, Jan Friesse wrote:

Sherrard,
On 4/7/20 12:53 AM, Strahil Nikolov wrote:
Hi Sherrard,

Have you tried to increase the qnet timers in the corosync.conf ?
Strahil,
i have actually reduced the qnet timers in order to improve failoverresponse time, per Jan's suggestion on the thread '[ClusterLabs] >reducing corosync-qnetd "response time"'
This is actually different problem and reduced qnetd and qdevice timerswill not help. This problem is really about 2 node cluster which is halfsplit into two single node memberships. Qnetd then gives vote to nodewith lowest node id, in this case it is newly restarted node.


Jan,

i bought into Strahil's question about increasing the timers, notbecause the timers are related to the tie-breaker, per-se, but becausethe race condition seems to be triggered by (but not caused by) the factthat qnetd is able to establish communication before knet.

ie, if the timing could be adjusted so that qnetd connects only afterknet, then the rebooted node would be able to see the running nodebefore contacting the qdevice.

of course, none of that would represent a real fix, and would actuallyintroduce a different set of problems. i just wanted to clarify myinterpretation of Strahil's question.

Regards,
   Honza

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] temporary loss of quorum when member starts to rejoin

Reply via email to