Hi Jan Thanks for your super quick response !
We do not use a Network Manager - it's all static on these Ubuntu 14.04 nodes (/etc/network/interfaces). I do not think we did an ifdown on the network interface manually. However, the IP-Addresses are assigned to bond0 and bond1 - we use 4x physical network interfaces with 2x bond'ed into a public (bond1) and 2x bond'ed into a private network (bond0). Could this have anything to do with it ? Regards, Martin Schlegel ___________________ >From /etc/network/interfaces, i.e. auto bond0 iface bond0 inet static #pre-up /sbin/ethtool -s bond0 speed 1000 duplex full autoneg on post-up ifenslave bond0 eth0 eth2 pre-down ifenslave -d bond0 eth0 eth2 bond-slaves none bond-mode 4 bond-lacp-rate fast bond-miimon 100 bond-downdelay 0 bond-updelay 0 bond-xmit_hash_policy 1 address [...] > Jan Friesse <jfrie...@redhat.com> hat am 16. Juni 2016 um 17:55 geschrieben: > > Martin Schlegel napsal(a): > > > Hello everyone, > > > > we run a 3 node Pacemaker (1.1.14) / Corosync (2.3.5) cluster for a couple > > of > > months successfully and we have started seeing a faulty ring with unexpected > > 127.0.0.1 binding that we cannot reset via "corosync-cfgtool -r". > > This is problem. Bind to 127.0.0.1 = ifdown happened = problem and with > RRP it means BIG problem. > > > We have had this once before and only restarting Corosync (and everything > > else) > > on the node showing the unexpected 127.0.0.1 binding made the problem go > > away. > > However, in production we obviously would like to avoid this if possible. > > Just don't do ifdown. Never. If you are using NetworkManager (which does > ifdown by default if cable is disconnected), use something like > NetworkManager-config-server package (it's just change of configuration > so you can adopt it to whatever distribution you are using). > > Regards, > Honza > > > So from the following description - how can I troubleshoot this issue and/or > > does anybody have a good idea what might be happening here ? > > > > We run 2x passive rrp rings across different IP-subnets via udpu and we get > > the > > following output (all IPs obfuscated) - please notice the unexpected > > interface > > binding 127.0.0.1 for host pg2. > > > > If we reset via "corosync-cfgtool -r" on each node heartbeat ring id 1 > > briefly > > shows "no faults" but goes back to "FAULTY" seconds later. > > > > Regards, > > Martin Schlegel > > _____________________________________ > > > > root@pg1:~# corosync-cfgtool -s > > Printing ring status. > > Local node ID 1 > > RING ID 0 > > id = A.B.C1.5 > > status = ring 0 active with no faults > > RING ID 1 > > id = D.E.F1.170 > > status = Marking ringid 1 interface D.E.F1.170 FAULTY > > > > root@pg2:~# corosync-cfgtool -s > > Printing ring status. > > Local node ID 2 > > RING ID 0 > > id = A.B.C2.88 > > status = ring 0 active with no faults > > RING ID 1 > > id = 127.0.0.1 > > status = Marking ringid 1 interface 127.0.0.1 FAULTY > > > > root@pg3:~# corosync-cfgtool -s > > Printing ring status. > > Local node ID 3 > > RING ID 0 > > id = A.B.C3.236 > > status = ring 0 active with no faults > > RING ID 1 > > id = D.E.F3.112 > > status = Marking ringid 1 interface D.E.F3.112 FAULTY > > > > _____________________________________ > > > > /etc/corosync/corosync.conf from pg1 0 other nodes use different subnets and > > IPs, but are otherwise identical: > > =========================================== > > quorum { > > provider: corosync_votequorum > > expected_votes: 3 > > } > > > > totem { > > version: 2 > > > > crypto_cipher: none > > crypto_hash: none > > > > rrp_mode: passive > > interface { > > ringnumber: 0 > > bindnetaddr: A.B.C1.0 > > mcastport: 5405 > > ttl: 1 > > } > > interface { > > ringnumber: 1 > > bindnetaddr: D.E.F1.64 > > mcastport: 5405 > > ttl: 1 > > } > > transport: udpu > > } > > > > nodelist { > > node { > > ring0_addr: pg1 > > ring1_addr: pg1p > > nodeid: 1 > > } > > node { > > ring0_addr: pg2 > > ring1_addr: pg2p > > nodeid: 2 > > } > > node { > > ring0_addr: pg3 > > ring1_addr: pg3p > > nodeid: 3 > > } > > } > > > > logging { > > to_syslog: yes > > } > > > > =========================================== > > > > _______________________________________________ > > Users mailing list: Users@clusterlabs.org > > http://clusterlabs.org/mailman/listinfo/users > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > > _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org