I can confirm that doing an ifdown is not the source of my corosync issues. My cluster is in another state, so I can't pull a cable, but I can down a port on a switch. That had the exact same affects as doing an ifdown. Two machines got fenced when it should have only been one.
------- Seth Reid System Operations Engineer Vendini, Inc. 415.349.7736 sr...@vendini.com www.vendini.com On Fri, Mar 31, 2017 at 4:12 AM, Dejan Muhamedagic <deja...@fastmail.fm> wrote: > Hi, > > On Fri, Mar 31, 2017 at 02:39:02AM -0400, Digimer wrote: > > On 31/03/17 02:32 AM, Jan Friesse wrote: > > >> The original message has the logs from nodes 1 and 3. Node 2, the one > > >> that > > >> got fenced in this test, doesn't really show much. Here are the logs > from > > >> it: > > >> > > >> Mar 24 16:35:10 b014 ntpd[2318]: Deleting interface #5 enp6s0f0, > > >> 192.168.100.14#123, interface stats: received=0, sent=0, dropped=0, > > >> active_time=3253 secs > > >> Mar 24 16:35:10 b014 ntpd[2318]: Deleting interface #7 enp6s0f0, > > >> fe80::a236:9fff:fe8a:6500%6#123, interface stats: received=0, sent=0, > > >> dropped=0, active_time=3253 secs > > >> Mar 24 16:35:13 b014 corosync[2166]: notice [TOTEM ] A processor > failed, > > >> forming new configuration. > > >> Mar 24 16:35:13 b014 corosync[2166]: [TOTEM ] A processor failed, > > >> forming > > >> new configuration. > > >> Mar 24 16:35:13 b014 corosync[2166]: notice [TOTEM ] The network > > >> interface > > >> is down. > > > > > > This is problem. Corosync handles ifdown really badly. If this was not > > > intentional it may be caused by NetworkManager. Then please install > > > equivalent of NetworkManager-config-server package (it's actually one > > > file called 00-server.conf so you can extract it from, for example, > > > Fedora package > > > https://www.rpmfind.net/linux/RPM/fedora/devel/rawhide/x86_ > 64/n/NetworkManager-config-server-1.8.0-0.1.fc27.noarch.html) > > > > ifdown'ing corosync's interface happens a lot, intentionally or > > otherwise. > > I'm not sure, but I think that it can happen only intentionally, > i.e. through a human intervention. If there's another problem > with the interface it doesn't disappear from the system. > > Thanks, > > Dejan > > > I think it is reasonable to expect corosync to handle this > > properly. How hard would it be to make corosync resilient to this fault > > case? > > > > -- > > Digimer > > Papers and Projects: https://alteeve.com/w/ > > "I am, somehow, less interested in the weight and convolutions of > > Einstein’s brain than in the near certainty that people of equal talent > > have lived and died in cotton fields and sweatshops." - Stephen Jay Gould > > > > _______________________________________________ > > Users mailing list: Users@clusterlabs.org > > http://lists.clusterlabs.org/mailman/listinfo/users > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org >
_______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org