On 4/7/20 8:40 AM, Jan Friesse wrote:
Sherrard,
On 4/7/20 12:53 AM, Strahil Nikolov wrote:
Hi Sherrard,
Have you tried to increase the qnet timers in the corosync.conf ?
Strahil,
i have actually reduced the qnet timers in order to improve failover
response time, per Jan's suggestion on the thread '[ClusterLabs] >
reducing corosync-qnetd "response time"'
This is actually different problem and reduced qnetd and qdevice timers
will not help. This problem is really about 2 node cluster which is half
split into two single node memberships. Qnetd then gives vote to node
with lowest node id, in this case it is newly restarted node.
Jan,
i bought into Strahil's question about increasing the timers, not
because the timers are related to the tie-breaker, per-se, but because
the race condition seems to be triggered by (but not caused by) the fact
that qnetd is able to establish communication before knet.
ie, if the timing could be adjusted so that qnetd connects only after
knet, then the rebooted node would be able to see the running node
before contacting the qdevice.
of course, none of that would represent a real fix, and would actually
introduce a different set of problems. i just wanted to clarify my
interpretation of Strahil's question.
Regards,
Honza
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/