Hi Everyone.I have 16-nodes asynchronous cluster configured with Corosync redundant ring feature.
Each node has 2 similarly connected/configured NIC's. One NIC connected to the public network,
another one to our private VLAN. When I checked Corosync rings operability I found:
# corosync-cfgtool -s Printing ring status. Local node ID 1 RING ID 0 id = 192.168.1.54 status = Marking ringid 0 interface 192.168.1.54 FAULTY RING ID 1 id = 111.11.11.1 status = ring 1 active with no faultsAfter some time of digging into I identified that if I enable back the failed ring with command:
# corosync-cfgtool -rRING ID 0 will be marked as "active" for few minutes, but after it marked permanently as faulty.
Log has no any useful info, just single message: corosync[21740]: [TOTEM ] Marking ringid 0 interface 192.168.1.54 FAULTY And no any message like: [TOTEM ] Automatically recovered ring 1 My corosync.conf looks like: compatibility: whitetank totem { version: 2 secauth: on threads: 4 rrp_mode: passive interface { member { memberaddr: PRIVATE_IP_1 } ... member { memberaddr: PRIVATE_IP_16 } ringnumber: 0 bindnetaddr: PRIVATE_NET_ADDR mcastaddr: 226.0.0.1 mcastport: 5505 ttl: 1 } interface { member { memberaddr: PUBLIC_IP_1 } ... member { memberaddr: PUBLIC_IP_16 } ringnumber: 1 bindnetaddr: PUBLIC_NET_ADDR mcastaddr: 224.0.0.1 mcastport: 5405 ttl: 1 } transport: udpu logging { to_stderr: no to_logfile: yes logfile: /var/log/cluster/corosync.log logfile_priority: info to_syslog: yes syslog_priority: warning debug: on timestamp: on }I had tried to change rrp_mode, mcastaddr/mcastport for ringnumber: 0, but result was the similar.
I checked multicast/unicast operability using omping utility and didn't found any issues.
Also no errors on our private VLAN was found for network equipment.Why Corosync decided to disable permanently second ring? How I can debug the issue?
Other properties: Corosync Cluster Engine, version '1.4.7' Pacemaker properties: cluster-infrastructure: cman cluster-recheck-interval: 5min dc-version: 1.1.14-8.el6-70404b0 expected-quorum-votes: 3 have-watchdog: false last-lrm-refresh: 1484068350 maintenance-mode: false no-quorum-policy: ignore pe-error-series-max: 1000 pe-input-series-max: 1000 pe-warn-series-max: 1000 stonith-action: reboot stonith-enabled: false symmetric-cluster: false Thank you. -- Regards Denis Gribkov
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org