>>> Martin Schlegel <mar...@nuboreto.org> schrieb am 19.07.2016 um 00:51 in Nachricht <301244266.332724.5ea3ddc5-55ea-43b0-9a1b-22ebb1dcafd2.open-xchange@email.1und1. e>: > Thanks Jan ! > > If anybody else is hitting the error of a ring being bound to 127.0.0.1 > instead > of the configured IP and corosync-cfgtool -s showing "[...] interface > 127.0.0.1 > FAULTY [...]" .... > > We noticed an issue occasionally occurring at boot time, that we believe to > be a > bug in Ubuntu 14.04. It causes Corosync to start before all bindnetaddr IPs > are > up and running.
Would it happen also if someone does a "rcnetwork restart" while the cluster is up? I think we had it once in SLES11 also, but I wa snever sure how it was triggered. > > What happens is that despite the $network dependency and correct order for > the > corosync runlevel script the corosync service might be started after only > the > bond0 interface was fully started, but before our bond1 interface was > assigned > the IP-address. > > For now we have added some code to the Corosync runlevel scripts that waits > for > a certain amount for whatever bindnetaddr-IPs had been configured in > /etc/corosync/corosync.conf . > > Cheers, > Martin Schlegel > > >> Jan Friesse <jfrie...@redhat.com> hat am 16. Juni 2016 um 17:55 geschrieben: >> >> Martin Schlegel napsal(a): >> >> > Hello everyone, >> > >> > we run a 3 node Pacemaker (1.1.14) / Corosync (2.3.5) cluster for a couple >> > of >> > months successfully and we have started seeing a faulty ring with > unexpected >> > 127.0.0.1 binding that we cannot reset via "corosync-cfgtool -r". >> >> This is problem. Bind to 127.0.0.1 = ifdown happened = problem and with >> RRP it means BIG problem. >> >> > We have had this once before and only restarting Corosync (and everything >> > else) >> > on the node showing the unexpected 127.0.0.1 binding made the problem go >> > away. >> > However, in production we obviously would like to avoid this if possible. >> >> Just don't do ifdown. Never. If you are using NetworkManager (which does >> ifdown by default if cable is disconnected), use something like >> NetworkManager-config-server package (it's just change of configuration >> so you can adopt it to whatever distribution you are using). >> >> Regards, >> Honza >> >> > So from the following description - how can I troubleshoot this issue > and/or >> > does anybody have a good idea what might be happening here ? >> > >> > We run 2x passive rrp rings across different IP-subnets via udpu and we get >> > the >> > following output (all IPs obfuscated) - please notice the unexpected >> > interface >> > binding 127.0.0.1 for host pg2. >> > >> > If we reset via "corosync-cfgtool -r" on each node heartbeat ring id 1 >> > briefly >> > shows "no faults" but goes back to "FAULTY" seconds later. >> > >> > Regards, >> > Martin Schlegel >> > _____________________________________ >> > >> > root@pg1:~# corosync-cfgtool -s >> > Printing ring status. >> > Local node ID 1 >> > RING ID 0 >> > id = A.B.C1.5 >> > status = ring 0 active with no faults >> > RING ID 1 >> > id = D.E.F1.170 >> > status = Marking ringid 1 interface D.E.F1.170 FAULTY >> > >> > root@pg2:~# corosync-cfgtool -s >> > Printing ring status. >> > Local node ID 2 >> > RING ID 0 >> > id = A.B.C2.88 >> > status = ring 0 active with no faults >> > RING ID 1 >> > id = 127.0.0.1 >> > status = Marking ringid 1 interface 127.0.0.1 FAULTY >> > >> > root@pg3:~# corosync-cfgtool -s >> > Printing ring status. >> > Local node ID 3 >> > RING ID 0 >> > id = A.B.C3.236 >> > status = ring 0 active with no faults >> > RING ID 1 >> > id = D.E.F3.112 >> > status = Marking ringid 1 interface D.E.F3.112 FAULTY >> > >> > _____________________________________ >> > >> > /etc/corosync/corosync.conf from pg1 0 other nodes use different subnets > and >> > IPs, but are otherwise identical: >> > =========================================== >> > quorum { >> > provider: corosync_votequorum >> > expected_votes: 3 >> > } >> > >> > totem { >> > version: 2 >> > >> > crypto_cipher: none >> > crypto_hash: none >> > >> > rrp_mode: passive >> > interface { >> > ringnumber: 0 >> > bindnetaddr: A.B.C1.0 >> > mcastport: 5405 >> > ttl: 1 >> > } >> > interface { >> > ringnumber: 1 >> > bindnetaddr: D.E.F1.64 >> > mcastport: 5405 >> > ttl: 1 >> > } >> > transport: udpu >> > } >> > >> > nodelist { >> > node { >> > ring0_addr: pg1 >> > ring1_addr: pg1p >> > nodeid: 1 >> > } >> > node { >> > ring0_addr: pg2 >> > ring1_addr: pg2p >> > nodeid: 2 >> > } >> > node { >> > ring0_addr: pg3 >> > ring1_addr: pg3p >> > nodeid: 3 >> > } >> > } >> > >> > logging { >> > to_syslog: yes >> > } >> > >> > =========================================== >> > >> > _______________________________________________ >> > Users mailing list: Users@clusterlabs.org >> > http://clusterlabs.org/mailman/listinfo/users >> > >> > Project Home: http://www.clusterlabs.org >> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> > Bugs: http://bugs.clusterlabs.org >> >> > > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org