On 4/7/20 8:40 AM, Jan Friesse wrote:
Sherrard,




On 4/7/20 12:53 AM, Strahil Nikolov wrote:

Hi Sherrard,

Have you tried to increase the qnet timers in the corosync.conf ?


Strahil,
i have actually reduced the qnet timers in order to improve failover response time, per Jan's suggestion on the thread '[ClusterLabs]  > reducing corosync-qnetd "response time"'

This is actually different problem and reduced qnetd and qdevice timers will not help. This problem is really about 2 node cluster which is half split into two single node memberships. Qnetd then gives vote to node with lowest node id, in this case it is newly restarted node.


Jan,
i bought into Strahil's question about increasing the timers, not because the timers are related to the tie-breaker, per-se, but because the race condition seems to be triggered by (but not caused by) the fact that qnetd is able to establish communication before knet.

ie, if the timing could be adjusted so that qnetd connects only after knet, then the rebooted node would be able to see the running node before contacting the qdevice.

of course, none of that would represent a real fix, and would actually introduce a different set of problems. i just wanted to clarify my interpretation of Strahil's question.


Regards,
   Honza

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to