On April 7, 2020 12:21:50 AM GMT+03:00, Sherrard Burton <sb-clusterl...@allafrica.com> wrote: > > >On 4/6/20 4:10 PM, Andrei Borzenkov wrote: >> 06.04.2020 20:57, Sherrard Burton пишет: >>> >>> >>> On 4/6/20 1:20 PM, Sherrard Burton wrote: >>>> >>>> >>>> On 4/6/20 12:35 PM, Andrei Borzenkov wrote: >>>>> 06.04.2020 17:05, Sherrard Burton пишет: >>>>>> >>>>>> from the quorum node: >>>> ... >>>>>> Apr 05 23:10:17 debug Client ::ffff:192.168.250.50:54462 >(cluster >>>>>> xen-nfs01_xen-nfs02, node_id 1) sent quorum node list. >>>>>> Apr 05 23:10:17 debug msg seq num = 6 >>>>>> Apr 05 23:10:17 debug quorate = 0 >>>>>> Apr 05 23:10:17 debug node list: >>>>>> Apr 05 23:10:17 debug node_id = 1, data_center_id = 0, >node_state >>>>>> = member >>>>> >>>>> Oops. How comes that node that was rebooted formed cluster all by >>>>> itself, without seeing the second node? Do you have two_nodes >and/or >>>>> wait_for_all configured? >>>>> >>> >>> i never thought to check the logs on the rebooted server. hopefully >>> someone can extract some further useful information here: >>> >>> >>> https://pastebin.com/imnYKBMN >>> >> >> It looks like some timing issue or race condition. After reboot node >> manages to contact qnetd first, before connection to other node is >> established. Qnetd behaves as documented - it sees two equal size >> partitions and favors the partition that includes tie breaker (lowest >> node id). So existing node goes out of quorum. Second later both >nodes >> see each other and so quorum is regained. > > >thank you for taking the time to troll through my debugging output. >your >explanation seems to accurately describe what i am experiencing. of >course i have no idea how to remedy it. :-) > >> >> I cannot reproduce it, but I also do not use knet. From documentation >I >> have impression that knet has artificial delay before it considers >links >> operational, so may be that is the reason. > >i will do some reading on how knet factors into all of this and respond > >with any questions or discoveries. > >> >>>> >>>> BTW, great eyes. i had not picked up on that little nuance. i had >>>> poured through this particular log a number of times, but it was >very >>>> hard for me to discern the starting and stopping points for each >>>> logical group of messages. the indentation made some of it clear. >but >>>> when you have a series of lines beginning in the left-most column, >it >>>> is not clear whether they belong to the previous group, the next >>>> group, or they are their own group. >>>> >>>> just wanted to note my confusion in case the relevant maintainer >>>> happens across this thread. >>>> >>>> thanks again >>>> _______________________________________________ >>>> Manage your subscription: >>>> https://lists.clusterlabs.org/mailman/listinfo/users >>>> >>>> ClusterLabs home: https://www.clusterlabs.org/ >> >> _______________________________________________ >> Manage your subscription: >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> ClusterLabs home: https://www.clusterlabs.org/ >> >_______________________________________________ >Manage your subscription: >https://lists.clusterlabs.org/mailman/listinfo/users > >ClusterLabs home: https://www.clusterlabs.org/
Hi Sherrard, Have you tried to increase the qnet timers in the corosync.conf ? Best Regards, Strahil Nikolov _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/