On Fri, May 3, 2024 at 8:59 PM <ale...@pavlyuts.ru> wrote: > Hi, > > > > Also, I've done wireshark capture and found great mess in TCP, it > > > seems like connection between qdevice and qnetd really stops for some > > > time and packets won't deliver. > > > > Could you check UDP? I guess there is a lot of UDP packets sent by > corosync > > which probably makes TCP to not go thru. > Very improbably. UPD itself can't prevent TCP from working, and 1GB links > seems too wide for corosync may overload it. > Also, overload usually leads to SOME packets drop, but there absolutely > other case: NO TCP packet passed, I got two captures from two side and I > see > that for some time each party sends TCP packets, but other party do not > receive it at all. > > > > > > > For my guess, it match corosync syncing activities, and I suspect that > > > corosync prevent any other traffic on the interface it use for rings. > > > > > > As I switch qnetd and qdevice to use different interface it seems to > > > work fine. > > > > Actually having dedicated interface just for corosync/knet traffic is > optimal > > solution. qdevice+qnetd on the other hand should be as close to > "customer" > as > > possible. > > > I am sure qnetd is not intended to proof of network reachability, it only > an > arbiter to provide quorum resolution. Therefore, as for me it is better to > keep it on the intra-cluster network with high priority transport. If we > need to make a solution based on network reachability, there other ways to > provide it. >
This is an example how you could use network reachability to give preference to a node with better reachability in a 2-node-fencing-race. There is text in the code that should give you an idea how it is supposed to work. https://github.com/ClusterLabs/fence-agents/blob/main/agents/heuristics_ping/fence_heuristics_ping.py If you think of combining with priority-fencing ... Of course this idea can be applied for other ways of evaluation of a running node. I did implement fence_heuristics_ping both for an explicit use-case and to convey the basic idea back then - having in mind that others might come up with different examples. Guess the main idea of having qdevice+qnetd outside of each of the 2 data-centers (if we're talking about a scenario of this kind) is to be able to cover the case where one of these data-centers becomes disconnected for whatever reason. Correct me please if there is more to it! In this scenario you could use e.g. SBD watchdog-fencing to be able to safely recover resources from a disconnected data-center (or site of any kind) . Klaus > > So if you could have two interfaces (one just for corosync, second for > > qnetd+qdevice+publicly accessible services) it might be a solution? > > > Yes, this way it works, but I wish to know WHY it won't work on the shared > interface. > > > > So, the question is: does corosync really temporary blocks any other > > > traffic on the interface it uses? Or it is just a coincidence? If it > > > blocks, is > > > > Nope, no "blocking". But it sends quite some few UDP packets and I guess > it can > > really use all available bandwidth so no TCP goes thru. > Use all available 1GBps? Impossible. > > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > >
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/