Digimer napsal(a):
On 31/03/17 02:32 AM, Jan Friesse wrote:
The original message has the logs from nodes 1 and 3. Node 2, the one
that
got fenced in this test, doesn't really show much. Here are the logs from
it:

Mar 24 16:35:10 b014 ntpd[2318]: Deleting interface #5 enp6s0f0,
192.168.100.14#123, interface stats: received=0, sent=0, dropped=0,
active_time=3253 secs
Mar 24 16:35:10 b014 ntpd[2318]: Deleting interface #7 enp6s0f0,
fe80::a236:9fff:fe8a:6500%6#123, interface stats: received=0, sent=0,
dropped=0, active_time=3253 secs
Mar 24 16:35:13 b014 corosync[2166]: notice  [TOTEM ] A processor failed,
forming new configuration.
Mar 24 16:35:13 b014 corosync[2166]:  [TOTEM ] A processor failed,
forming
new configuration.
Mar 24 16:35:13 b014 corosync[2166]: notice  [TOTEM ] The network
interface
is down.

This is problem. Corosync handles ifdown really badly. If this was not
intentional it may be caused by NetworkManager. Then please install
equivalent of NetworkManager-config-server package (it's actually one
file called 00-server.conf so you can extract it from, for example,
Fedora package
https://www.rpmfind.net/linux/RPM/fedora/devel/rawhide/x86_64/n/NetworkManager-config-server-1.8.0-0.1.fc27.noarch.html)

ifdown'ing corosync's interface happens a lot, intentionally or
otherwise. I think it is reasonable to expect corosync to handle this
properly. How hard would it be to make corosync resilient to this fault
case?

Really hard. Knet (so whatever becomes corosync 3.x) should solve this issue.


_______________________________________________
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to