Hello, Im getting crazy about this problem, that I expect to resolve here, with your help guys:
I have 2 nodes with Corosync redundant ring feature. Each node has 2 similarly connected/configured NIC's. Both nodes are connected each other by two crossover cables. I configured both nodes with rrp mode passive. Everything is working well at this point, but when I shutdown 1 node to test failover, and this node returns to be online, corosync is marking the interface as FAULTY and rrp fails to recover the initial state: 1. Initial scenario: # corosync-cfgtool -s Printing ring status. Local node ID 1 RING ID 0 id = 192.168.0.1 status = ring 0 active with no faults RING ID 1 id = 192.168.1.1 status = ring 1 active with no faults 2. When I shutdown the node 2, all continues with no faults. Sometimes the ring ID's are bonding with 127.0.0.1 and then bond back to their respective heartbeat IP. 3. When node 2 is back online: # corosync-cfgtool -s Printing ring status. Local node ID 1 RING ID 0 id = 192.168.0.1 status = ring 0 active with no faults RING ID 1 id = 192.168.1.1 status = Marking ringid 1 interface 192.168.1.1 FAULTY # service corosync status ● corosync.service - Corosync Cluster Engine Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled) Active: active (running) since Wed 2018-08-22 14:44:09 CEST; 1min 38s ago Docs: man:corosync man:corosync.conf man:corosync_overview Main PID: 1439 (corosync) Tasks: 2 (limit: 4915) CGroup: /system.slice/corosync.service └─1439 /usr/sbin/corosync -f Aug 22 14:44:11 node1 corosync[1439]: Aug 22 14:44:11 notice [TOTEM ] The network interface [192.168.0.1] is now up. Aug 22 14:44:11 node1 corosync[1439]: [TOTEM ] The network interface [192.168.0.1] is now up. Aug 22 14:44:11 node1 corosync[1439]: Aug 22 14:44:11 notice [TOTEM ] The network interface [192.168.1.1] is now up. Aug 22 14:44:11 node1 corosync[1439]: [TOTEM ] The network interface [192.168.1.1] is now up. Aug 22 14:44:26 node1 corosync[1439]: Aug 22 14:44:26 notice [TOTEM ] A new membership (192.168.0.1:601760) was formed. Members Aug 22 14:44:26 node1 corosync[1439]: [TOTEM ] A new membership ( 192.168.0.1:601760) was formed. Members Aug 22 14:44:32 node1 corosync[1439]: Aug 22 14:44:32 notice [TOTEM ] A new membership (192.168.0.1:601764) was formed. Members joined: 2 Aug 22 14:44:32 node1 corosync[1439]: [TOTEM ] A new membership ( 192.168.0.1:601764) was formed. Members joined: 2 Aug 22 14:44:34 node1 corosync[1439]: Aug 22 14:44:34 error [TOTEM ] Marking ringid 1 interface 192.168.1.1 FAULTY Aug 22 14:44:34 node1 corosync[1439]: [TOTEM ] Marking ringid 1 interface 192.168.1.1 FAULTY If I execute corosync-cfgtool, clears the faulty error but after some seconds return to be FAULTY. The only thing that it resolves the problem is to restart de service with service corosync restart. Here you have some of my configuration settings on node 1 (I probed already to change rrp_mode): *- corosync.conf* totem { version: 2 cluster_name: node token: 5000 token_retransmits_before_loss_const: 10 secauth: off threads: 0 rrp_mode: passive nodeid: 1 interface { ringnumber: 0 bindnetaddr: 192.168.0.0 #mcastaddr: 226.94.1.1 mcastport: 5405 broadcast: yes } interface { ringnumber: 1 bindnetaddr: 192.168.1.0 #mcastaddr: 226.94.1.2 mcastport: 5407 broadcast: yes } } logging { fileline: off to_stderr: yes to_syslog: yes to_logfile: yes logfile: /var/log/corosync/corosync.log debug: off timestamp: on logger_subsys { subsys: AMF debug: off } } amf { mode: disabled } quorum { provider: corosync_votequorum expected_votes: 2 } nodelist { node { nodeid: 1 ring0_addr: 192.168.0.1 ring1_addr: 192.168.1.1 } node { nodeid: 2 ring0_addr: 192.168.0.2 ring1_addr: 192.168.1.2 } } aisexec { user: root group: root } service { name: pacemaker ver: 1 } *- /etc/hosts* 127.0.0.1 localhost 10.4.172.5 node1.upc.edu node1 10.4.172.6 node2.upc.edu node2 Thank you for you help in advance! -- *David Tolosa Martínez* Customer Support & Infrastructure UPCnet - Edifici Vèrtex Plaça d'Eusebi Güell, 6, 08034 Barcelona Tel: 934054555 <https://www.upcnet.es> -- INFORMACIÓ BÀSICA SOBRE PROTECCIÓ DE DADES: Responsable: UPCNET, Serveis d'Accés a Internet de la Universitat Politècnica de Catalunya, SLU | Finalitat: gestionar els contactes i les relacions professionals i comercials amb els nostres clients i proveïdors | Base legal: consentiment, interès legítim i/o relació contractual | Destinataris: no seran comunicades a tercers excepte per obligació legal | Drets: pots exercir els teus drets d’accés, rectificació i supressió, així com els altres drets reconeguts a la normativa vigent, enviant-nos un missatge a priv...@upcnet.es <mailto:priv...@upcnet.es> | Més informació: consulta la nostra política completa de protecció de dades <https://www.upcnet.es/politica-de-privacitat>. AVÍS DE CONFIDENCIALITAT <https://www.upcnet.es/ca/avis-de-confidencialitat>
_______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org