Hi David and Mark, I compiled my own DLM kernel module with getting ride of "return -EINVAL;" line. Then did some tests with new DLM kernel module, under two-ring cluster plus "protocol=tcp" setting in /etc/dlm/dlm.conf. 1) if both networks were OK, all the tests were passed. 2) if I broken the second ring network, all the tests were passed (no any effect, since tcp protocol only uses the first ring's ip address). 3) if I broken the first ring network (e.g. ifconfig eth0 down on node3), the tests were hanged on the other nodes (e.g. node1 and node2), until node3 was rebooted manually or node3's network was back (e.g. ifconfig eth0 up on node3). 4) I switched two-ring cluster into one-ring cluster (edit /etc/corosync/corosync.conf), I broken the network from one node, this node was fenced immediately. 5) but why the node3 was not fenced in case 3)? it looks like a bug? since the tests were hanged, we have to reboot that node manually.
Thanks Gang >>> > On Thu, Apr 12, 2018 at 09:31:49PM -0600, Gang He wrote: >> During this period, could we allow tcp protocol work (rather than return > error directly) under two-ring cluster? >> If the user sets using TCP protocol in command-line or dlm configuration > file, could we use the first ring IP address to work? >> I do not know why we return error directly in this case? there was any > concern before? > > You're talking about this: > > /* We don't support multi-homed hosts */ > if (dlm_local_addr[1] != NULL) { > log_print("TCP protocol can't handle multi-homed hosts, " > "try SCTP"); > return -EINVAL; > } > > I think that should be ok to remove, and just use the first addr. > Mark, do you see any reason to avoid that? > > Dave _______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org