Re: [ClusterLabs] Corosync node gets unique Ring ID

Christine Caulfield Wed, 27 Jan 2021 00:19:02 -0800

A few things really stand out from this report, I think the inconsistentring_id is just a symptom.

It worries me that corosync-quorumtool behaves differently on some nodes- some show names, some just IP addresses. That could be a cause of someinconsistency.


Also the messages
"

Jan 26 02:10:45 [13191] destination-standby corosync warning [MAIN ]Totem is unable to form a cluster because of an operating system ornetwork fault. The most common cause of this message is that the localfirewall is configured improperly.Jan 26 02:10:47 [13191] destination-standby corosync warning [MAIN ]Totem is unable to form a cluster because of an operating system ornetwork fault. The most common cause of this message is that the localfirewall is configured improperly.Jan 26 02:10:48 [13191] destination-standby corosync debug [TOTEM ]The consensus timeout expired.Jan 26 02:10:48 [13191] destination-standby corosync debug [TOTEM ]entering GATHER state from 3(The consensus timeout expired.).Jan 26 02:10:48 [13191] destination-standby corosync warning [MAIN ]Totem is unable to form a cluster because of an operating system ornetwork fault. The most common cause of this message is that the localfirewall is configured improperly."

Are a BAD sign. All this is contributing to the problems and also thetimeout on reload (which is reallly not a good thing). Those messagesare not caused by the reload, they are caused by some networking problems.

So what seems to be happening is that the cluster is being partitionedsomehow (I can't tell why, that's something you'll need to investigate)and corosync isn't recovering very well from it. One of the things thatcan make this happen is doing "ifdown" - which that old version ofcoroysnc doesn't cope with very well. Even if that's not exactly whatyou are doing (and I see no reason to beleive you are) I do wonder ifsomething similar is happening by other means - NetworkManager perhaps?)

So firstly, check the networking setup and be sure that all te nodes areconsistently configures and check that the network is not closing downinterfaces or ports at the time of the incident.

Oh and also, try to upgrade to corosync 2.4.5 at least. I'm sure thatwill help.


Chrissie



On 26/01/2021 02:45, Igor Tverdovskiy wrote:

Hi All,

 > pacemakerd -$
Pacemaker 1.1.15-11.el7

 > corosync -v
Corosync Cluster Engine, version '2.4.0'

 > rpm -qi libqb
Name        : libqb
Version     : 1.0.1
Please assist. Recently faced a strange bug (I suppose), when one of thecluster nodes gets different from others "Ring ID" for example aftercorosync config reload , e.g.:
*Affected node:*
============
(target.standby)> sudo corosync-quorumtool
Quorum information
------------------
Date:             Tue Jan 26 01:58:54 2021
Quorum provider:  corosync_votequorum
Nodes:            5
Node ID:          5
Ring ID: *7/59268* <<<<<<<
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   5
Highest expected: 5
Total votes:      5
Quorum:           3
Flags:            Quorate

Membership information
----------------------
     Nodeid      Votes Name
          7          1 dispatching-sbc
          8          1 dispatching-sbc-2-6
          3          1 10.27.77.202
          5          1 cassandra-3 (local)
          6          1 10.27.77.205

============

*OK nodes:*
 > sudo corosync-quorumtool
Quorum information
------------------
Date:             Tue Jan 26 01:59:13 2021
Quorum provider:  corosync_votequorum
Nodes:            4
Node ID:          8
Ring ID: *7/59300* <<<<<<<
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   5
Highest expected: 5
Total votes:      4
Quorum:           3
Flags:            Quorate

Membership information
----------------------
     Nodeid      Votes Name
          7          1 10.27.77.106
          8          1 10.27.77.107 (local)
          3          1 10.27.77.202
          6          1 10.27.77.205
============
Also strange is that *crm status shows only two of five nodes* on theaffected node, but at the same time
*"sudo crm_node -l" shows all 5 nodes as members*.
============
(target.standby)> sudo crm_node -l
5 target.standby member
7 target.dsbc1 member
3 target.sip member
8 target.dsbc member
6 target.sec.sip member

-------

(target.standby)> sudo crm status
Stack: corosync
Current DC: target.sip (version 1.1.15-11.el7-e174ec8) - partition withquorumLast updated: Tue Jan 26 02:08:02 2021 Last change: Mon Jan 2514:27:18 2021 by root via crm_node on target.sec.sip
2 nodes and 7 resources configured

Online: [ target.sec.sip target.sip ] <<<<<<

Full list of resources:
============

The issue here is that crm configure operations fail with timeout error:
============
(target.standby)> sudo crm configure property maintenance-mode=true
*Call cib_apply_diff failed (-62): Timer expired*
ERROR: could not patch cib (rc=62)
INFO: offending xml diff: <diff format="2">
<change operation="modify"path="/cib/configuration/crm_config/cluster_property_set[@id='cib-bootstrap-options']/nvpair[@id='cib-bootstrap-options-maintenance-mode']">
     <change-list>
       <change-attr name="value" operation="set" value="true"/>
     </change-list>
     <change-result>
<nvpair name="maintenance-mode" value="true"id="cib-bootstrap-options-maintenance-mode"/>
     </change-result>
   </change>
</diff>
============


In the log there are errors that totem is unable to form a cluster:
============
(target.standby)

*First entry seems caused by node reloading (corosync-cfgtool -R):*
*
*Jan 26 01:40:35 [13190] destination-standby corosync notice [CFG ]Config reload requested by node 7Jan 26 01:40:35 [13190] destination-standby corosync debug [TOTEM ]removing dynamic member 10.27.77.106 for ring 0Jan 26 01:40:35 [13190] destination-standby corosync notice [TOTEM ]removing UDPU member {10.27.77.106}Jan 26 01:40:35 [13190] destination-standby corosync debug [TOTEM ]Closing socket to: {10.27.77.106}Jan 26 01:40:35 [13190] destination-standby corosync debug [TOTEM ]removing dynamic member 10.27.77.107 for ring 0Jan 26 01:40:35 [13190] destination-standby corosync notice [TOTEM ]removing UDPU member {10.27.77.107}Jan 26 01:40:35 [13190] destination-standby corosync debug [TOTEM ]Closing socket to: {10.27.77.107}Jan 26 01:40:35 [13190] destination-standby corosync debug [TOTEM ]removing dynamic member 10.27.77.204 for ring 0Jan 26 01:40:35 [13190] destination-standby corosync notice [TOTEM ]removing UDPU member {10.27.77.204}Jan 26 01:40:35 [13190] destination-standby corosync debug [TOTEM ]Closing socket to: {10.27.77.204}Jan 26 01:40:35 [13190] destination-standby corosync debug [TOTEM ]Configuration reloaded. Dumping actual totem config.Jan 26 01:40:35 [13190] destination-standby corosync debug [TOTEM ]Token Timeout (5000 ms) retransmit timeout (1190 ms)Jan 26 01:40:35 [13190] destination-standby corosync debug [TOTEM ]token hold (942 ms) retransmits before loss (4 retrans)Jan 26 01:40:35 [13190] destination-standby corosync debug [TOTEM ]join (50 ms) send_join (0 ms) consensus (6000 ms) merge (200 ms)Jan 26 01:40:35 [13190] destination-standby corosync debug [TOTEM ]downcheck (1000 ms) fail to recv const (2500 msgs)Jan 26 01:40:35 [13190] destination-standby corosync debug [TOTEM ]seqno unchanged const (30 rotations) Maximum network MTU 1369Jan 26 01:40:35 [13190] destination-standby corosync debug [TOTEM ]window size per rotation (50 messages) maximum messages per rotation (17messages)Jan 26 01:40:35 [13190] destination-standby corosync debug [TOTEM ]missed count const (5 messages)Jan 26 01:40:35 [13190] destination-standby corosync debug [TOTEM ]RRP token expired timeout (1190 ms)Jan 26 01:40:35 [13190] destination-standby corosync debug [TOTEM ]RRP token problem counter (2000 ms)Jan 26 01:40:35 [13190] destination-standby corosync debug [TOTEM ]RRP threshold (10 problem count)Jan 26 01:40:35 [13190] destination-standby corosync debug [TOTEM ]RRP multicast threshold (100 problem count)Jan 26 01:40:35 [13190] destination-standby corosync debug [TOTEM ]RRP automatic recovery check timeout (1000 ms)Jan 26 01:40:35 [13190] destination-standby corosync debug [TOTEM ]RRP mode set to none.Jan 26 01:40:35 [13190] destination-standby corosync debug [TOTEM ]heartbeat_failures_allowed (0)Jan 26 01:40:35 [13190] destination-standby corosync debug [TOTEM ]max_network_delay (50 ms)Jan 26 01:40:35 [13190] destination-standby corosync debug [VOTEQ ]Reading configuration (runtime: 1)Jan 26 01:40:35 [13190] destination-standby corosync debug [VOTEQ ] Nonodelist defined or our node is not in the nodelistJan 26 01:40:35 [13190] destination-standby corosync debug [VOTEQ ]ev_tracking=0, ev_tracking_barrier = 0: expected_votes = 2Jan 26 01:40:35 [13190] destination-standby corosync debug [VOTEQ ]flags: quorate: Yes Leaving: No WFA Status: No First: No Qdevice: NoQdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins: NoJan 26 01:40:35 [13191] destination-standby corosync debug [VOTEQ ]got nodeinfo message from cluster node 8Jan 26 01:40:35 [13191] destination-standby corosync debug [VOTEQ ]nodeinfo message[8]: votes: 1, expected: 5 flags: 1Jan 26 01:40:35 [13191] destination-standby corosync debug [VOTEQ ]flags: quorate: Yes Leaving: No WFA Status: No First: No Qdevice: NoQdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins: NoJan 26 01:40:35 [13191] destination-standby corosync debug [VOTEQ ]total_votes=5, expected_votes=2Jan 26 01:40:35 [13191] destination-standby corosync debug [VOTEQ ]Sending expected votes callbackJan 26 01:40:35 [13191] destination-standby corosync debug [VOTEQ ]node 3 state=1, votes=1, expected=5Jan 26 01:40:35 [13191] destination-standby corosync debug [VOTEQ ]node 5 state=1, votes=1, expected=5Jan 26 01:40:35 [13191] destination-standby corosync debug [VOTEQ ]node 6 state=1, votes=1, expected=5Jan 26 01:40:35 [13191] destination-standby corosync debug [VOTEQ ]node 7 state=1, votes=1, expected=5Jan 26 01:40:35 [13191] destination-standby corosync debug [VOTEQ ]node 8 state=1, votes=1, expected=5Jan 26 01:40:35 [13191] destination-standby corosync debug [VOTEQ ]lowest node id: 3 us: 5Jan 26 01:40:35 [13191] destination-standby corosync debug [VOTEQ ]highest node id: 8 us: 5Jan 26 01:40:35 [13191] destination-standby corosync debug [VOTEQ ]got nodeinfo message from cluster node 8Jan 26 01:40:35 [13191] destination-standby corosync debug [VOTEQ ]nodeinfo message[0]: votes: 0, expected: 0 flags: 0Jan 26 01:40:38 [13191] destination-standby corosync error [TOTEM ]FAILED TO RECEIVEJan 26 01:40:38 [13191] destination-standby corosync debug [TOTEM ]entering GATHER state from 6(failed to receive).Jan 26 01:40:44 [13191] destination-standby corosync debug [TOTEM ]entering GATHER state from 0(consensus timeout).Jan 26 01:40:48 [13240] destination-standby crmd: debug:throttle_cib_load: cib load: 0.000333 (1 ticks in 30s)Jan 26 01:40:48 [13240] destination-standby crmd: debug:throttle_load_avg: Current load is 1.010000 (full: 1.01 0.52 0.452/471 10513)Jan 26 01:40:48 [13240] destination-standby crmd: debug:throttle_io_load: Current IO load is 0.000000Jan 26 01:40:55 [13191] destination-standby corosync debug [TOTEM ]The consensus timeout expired.Jan 26 01:40:55 [13191] destination-standby corosync debug [TOTEM ]entering GATHER state from 3(The consensus timeout expired.).Jan 26 01:41:06 [13191] destination-standby corosync debug [TOTEM ]The consensus timeout expired.Jan 26 01:41:06 [13191] destination-standby corosync debug [TOTEM ]entering GATHER state from 3(The consensus timeout expired.).Jan 26 01:41:17 [13191] destination-standby corosync debug [TOTEM ]The consensus timeout expired.Jan 26 01:41:17 [13191] destination-standby corosync debug [TOTEM ]entering GATHER state from 3(The consensus timeout expired.).Jan 26 01:41:18 [13191] destination-standby corosync debug [QB ]IPC credentials authenticated (13191-14633-16)Jan 26 01:41:18 [13191] destination-standby corosync debug [QB ]connecting to client [14633]Jan 26 01:41:18 [13191] destination-standby corosync debug [QB ]shm size:1048589; real_size:1052672; rb->word_size:263168Jan 26 01:41:18 [13191] destination-standby corosync debug [QB ]shm size:1048589; real_size:1052672; rb->word_size:263168Jan 26 01:41:18 [13191] destination-standby corosync debug [QB ]shm size:1048589; real_size:1052672; rb->word_size:263168Jan 26 01:41:18 [13191] destination-standby corosync debug [MAIN ]connection createdJan 26 01:41:18 [13240] destination-standby crmd: debug:throttle_cib_load: cib load: 0.000000 (0 ticks in 30s)Jan 26 01:41:18 [13240] destination-standby crmd: debug:throttle_load_avg: Current load is 1.360000 (full: 1.36 0.64 0.491/475 14633)Jan 26 01:41:18 [13240] destination-standby crmd: debug:throttle_io_load: Current IO load is 0.000000Jan 26 01:41:28 [13191] destination-standby corosync debug [TOTEM ]The consensus timeout expired.Jan 26 01:41:28 [13191] destination-standby corosync debug [TOTEM ]entering GATHER state from 3(The consensus timeout expired.).Jan 26 01:41:28 [13191] destination-standby corosync warning [MAIN ]Totem is unable to form a cluster because of an operating system ornetwork fault. The most common cause of this message is that the localfirewall is configured improperly.Jan 26 01:41:30 [13191] destination-standby corosync warning [MAIN ]Totem is unable to form a cluster because of an operating system ornetwork fault. The most common cause of this message is that the localfirewall is configured improperly.
Jan 26 02:10:42 [13191] destination-standby corosync warning [MAIN ]*Totem is unable to form a cluster because of an operating system ornetwork fault. The most common cause of this message is that the localfirewall is configured improperly.*Jan 26 02:10:42 [13235] destination-standby cib: debug:crm_client_new: Connecting 0x560a4bef1f60 for uid=0 gid=0 pid=6916id=5a414534-bc62-4544-a5d2-6deb772a6b49Jan 26 02:10:42 [13235] destination-standby cib: debug:handle_new_connection: IPC credentials authenticated (13235-6916-13)Jan 26 02:10:42 [13235] destination-standby cib: debug:qb_ipcs_shm_connect: connecting to client [6916]Jan 26 02:10:42 [13235] destination-standby cib: debug:qb_rb_open_2: shm size:524301; real_size:528384; rb->word_size:132096Jan 26 02:10:42 [13235] destination-standby cib: debug:qb_rb_open_2: shm size:524301; real_size:528384; rb->word_size:132096Jan 26 02:10:42 [13235] destination-standby cib: debug:qb_rb_open_2: shm size:524301; real_size:528384; rb->word_size:132096Jan 26 02:10:42 [13235] destination-standby cib: debug:cib_acl_enabled: CIB ACL is disabledJan 26 02:10:42 [13235] destination-standby cib: debug:qb_ipcs_dispatch_connection_request: HUP conn (13235-6916-13)Jan 26 02:10:42 [13235] destination-standby cib: debug:qb_ipcs_disconnect: qb_ipcs_disconnect(13235-6916-13) state:2Jan 26 02:10:42 [13235] destination-standby cib: debug:crm_client_destroy: Destroying 0 eventsJan 26 02:10:42 [13235] destination-standby cib: debug:qb_rb_close_helper: Free'ing ringbuffer:/dev/shm/qb-cib_rw-response-13235-6916-13-headerJan 26 02:10:42 [13235] destination-standby cib: debug:qb_rb_close_helper: Free'ing ringbuffer:/dev/shm/qb-cib_rw-event-13235-6916-13-headerJan 26 02:10:42 [13235] destination-standby cib: debug:qb_rb_close_helper: Free'ing ringbuffer:/dev/shm/qb-cib_rw-request-13235-6916-13-headerJan 26 02:10:42 [13240] destination-standby crmd: debug:crm_client_new: Connecting 0x55ce3bb0e6f0 for uid=0 gid=0 pid=6919id=3a72b777-daa0-4b0e-acc2-fc58a07f31a6Jan 26 02:10:42 [13240] destination-standby crmd: debug:handle_new_connection: IPC credentials authenticated (13240-6919-13)Jan 26 02:10:42 [13240] destination-standby crmd: debug:qb_ipcs_shm_connect: connecting to client [6919]Jan 26 02:10:42 [13240] destination-standby crmd: debug:qb_rb_open_2: shm size:131085; real_size:135168; rb->word_size:33792Jan 26 02:10:42 [13240] destination-standby crmd: debug:qb_rb_open_2: shm size:131085; real_size:135168; rb->word_size:33792Jan 26 02:10:42 [13240] destination-standby crmd: debug:qb_rb_open_2: shm size:131085; real_size:135168; rb->word_size:33792Jan 26 02:10:44 [13191] destination-standby corosync warning [MAIN ]Totem is unable to form a cluster because of an operating system ornetwork fault. The most common cause of this message is that the localfirewall is configured improperly.Jan 26 02:10:44 [13240] destination-standby crmd: debug:qb_ipcs_dispatch_connection_request: HUP conn (13240-6125-14)Jan 26 02:10:44 [13240] destination-standby crmd: debug:qb_ipcs_disconnect: qb_ipcs_disconnect(13240-6125-14) state:2Jan 26 02:10:44 [13240] destination-standby crmd: debug:crm_client_destroy: Destroying 0 eventsJan 26 02:10:44 [13240] destination-standby crmd: debug:qb_rb_close_helper: Free'ing ringbuffer:/dev/shm/qb-crmd-response-13240-6125-14-headerJan 26 02:10:44 [13240] destination-standby crmd: debug:qb_rb_close_helper: Free'ing ringbuffer:/dev/shm/qb-crmd-event-13240-6125-14-headerJan 26 02:10:44 [13240] destination-standby crmd: debug:qb_rb_close_helper: Free'ing ringbuffer:/dev/shm/qb-crmd-request-13240-6125-14-headerJan 26 02:10:44 [13240] destination-standby crmd: debug:crm_client_new: Connecting 0x55ce3bb22960 for uid=0 gid=0 pid=6928id=039044f3-e674-4afa-9857-47459d1f0d0aJan 26 02:10:44 [13240] destination-standby crmd: debug:handle_new_connection: IPC credentials authenticated (13240-6928-14)Jan 26 02:10:44 [13240] destination-standby crmd: debug:qb_ipcs_shm_connect: connecting to client [6928]Jan 26 02:10:44 [13240] destination-standby crmd: debug:qb_rb_open_2: shm size:131085; real_size:135168; rb->word_size:33792Jan 26 02:10:44 [13240] destination-standby crmd: debug:qb_rb_open_2: shm size:131085; real_size:135168; rb->word_size:33792Jan 26 02:10:44 [13240] destination-standby crmd: debug:qb_rb_open_2: shm size:131085; real_size:135168; rb->word_size:33792Jan 26 02:10:44 [13235] destination-standby cib: debug:crm_client_new: Connecting 0x560a4bef1f60 for uid=0 gid=0 pid=6928id=45726483-e7b8-4ed5-8388-c6e8578d3366Jan 26 02:10:44 [13235] destination-standby cib: debug:handle_new_connection: IPC credentials authenticated (13235-6928-13)Jan 26 02:10:44 [13235] destination-standby cib: debug:qb_ipcs_shm_connect: connecting to client [6928]Jan 26 02:10:44 [13235] destination-standby cib: debug:qb_rb_open_2: shm size:524301; real_size:528384; rb->word_size:132096Jan 26 02:10:44 [13235] destination-standby cib: debug:qb_rb_open_2: shm size:524301; real_size:528384; rb->word_size:132096Jan 26 02:10:44 [13235] destination-standby cib: debug:qb_rb_open_2: shm size:524301; real_size:528384; rb->word_size:132096Jan 26 02:10:44 [13235] destination-standby cib: debug:cib_acl_enabled: CIB ACL is disabledJan 26 02:10:44 [13235] destination-standby cib: debug:qb_ipcs_dispatch_connection_request: HUP conn (13235-6928-13)Jan 26 02:10:44 [13235] destination-standby cib: debug:qb_ipcs_disconnect: qb_ipcs_disconnect(13235-6928-13) state:2Jan 26 02:10:44 [13235] destination-standby cib: debug:crm_client_destroy: Destroying 0 eventsJan 26 02:10:44 [13235] destination-standby cib: debug:qb_rb_close_helper: Free'ing ringbuffer:/dev/shm/qb-cib_rw-response-13235-6928-13-headerJan 26 02:10:44 [13235] destination-standby cib: debug:qb_rb_close_helper: Free'ing ringbuffer:/dev/shm/qb-cib_rw-event-13235-6928-13-headerJan 26 02:10:44 [13235] destination-standby cib: debug:qb_rb_close_helper: Free'ing ringbuffer:/dev/shm/qb-cib_rw-request-13235-6928-13-headerJan 26 02:10:44 [13240] destination-standby crmd: debug:qb_ipcs_dispatch_connection_request: HUP conn (13240-6928-14)Jan 26 02:10:44 [13240] destination-standby crmd: debug:qb_ipcs_disconnect: qb_ipcs_disconnect(13240-6928-14) state:2Jan 26 02:10:44 [13240] destination-standby crmd: debug:crm_client_destroy: Destroying 0 eventsJan 26 02:10:44 [13240] destination-standby crmd: debug:qb_rb_close_helper: Free'ing ringbuffer:/dev/shm/qb-crmd-response-13240-6928-14-headerJan 26 02:10:44 [13240] destination-standby crmd: debug:qb_rb_close_helper: Free'ing ringbuffer:/dev/shm/qb-crmd-event-13240-6928-14-headerJan 26 02:10:44 [13240] destination-standby crmd: debug:qb_rb_close_helper: Free'ing ringbuffer:/dev/shm/qb-crmd-request-13240-6928-14-headerJan 26 02:10:45 [13191] destination-standby corosync warning [MAIN ]Totem is unable to form a cluster because of an operating system ornetwork fault. The most common cause of this message is that the localfirewall is configured improperly.Jan 26 02:10:47 [13191] destination-standby corosync warning [MAIN ]Totem is unable to form a cluster because of an operating system ornetwork fault. The most common cause of this message is that the localfirewall is configured improperly.Jan 26 02:10:48 [13191] destination-standby corosync debug [TOTEM ]The consensus timeout expired.Jan 26 02:10:48 [13191] destination-standby corosync debug [TOTEM ]entering GATHER state from 3(The consensus timeout expired.).Jan 26 02:10:48 [13191] destination-standby corosync warning [MAIN ]Totem is unable to form a cluster because of an operating system ornetwork fault. The most common cause of this message is that the localfirewall is configured improperly.
============

(target.standby)> cat /etc/corosync/corosync.conf
totem {
         token: 5000
         version: 2
         secauth: off
         threads: 0
         interface {
                 ringnumber: 0
                 mcastport: 5405
                 bindnetaddr: 10.27.77.204
         }
         transport: udpu
}

nodelist {
         node {
             ring0_addr: 10.27.77.202
             name: target.sip
             nodeid: 3

         }
         node {
             ring0_addr: 10.27.77.205
             name: target.sec.sip
             nodeid: 6

         }
         node {
             ring0_addr: 10.27.77.106
             name: target.dsbc1
             nodeid: 7

         }
         node {
             ring0_addr: 10.27.77.107
             name: target.dsbc
             nodeid: 8

         }
         node {
             ring0_addr: 10.27.77.204
             name: target.standby
             nodeid: 5

         }
}

logging {
         fileline: off
         to_stderr: no
         to_logfile: yes
         to_syslog: no
logfile: /var/10.27.77.204/log/cluster-suite/corosync.log<http://10.27.77.204/log/cluster-suite/corosync.log>
         syslog_facility: local0
         debug: on
         timestamp: on
         logger_subsys {
                 subsys: QUORUM
                 debug: on
         }
}

quorum {
         provider: corosync_votequorum
         expected_votes: 5
}

===================

(target.standby)> ping 10.27.77.106 (target.dsbc1)
PING 10.27.77.106 (10.27.77.106) 56(84) bytes of data.
64 bytes from 10.27.77.106 <http://10.27.77.106>: icmp_seq=1 ttl=64time=0.177 ms64 bytes from 10.27.77.106 <http://10.27.77.106>: icmp_seq=2 ttl=64time=0.173 ms
===================

 > sudo nmap -sU -p 5405 10.27.77.106
Starting Nmap 6.40 ( http://nmap.org <http://nmap.org> ) at 2021-01-2602:24 UTC
Nmap scan report for dispatching-sbc (10.27.77.106)
Host is up (0.00029s latency).
PORT     STATE         SERVICE
5405/udp open|filtered unknown

=========
My observation showed that it can be fixed by stopping all nodes andstarting one by one.
Or restart of the affected node may help as well.
Nodes see each other when Ring ID is equal on all nodes.

Do you have any ideas?
Could you explain what is Ring ID in terms of corosync-quorum (rrp isnot configured)?
Why does it differ?
What additional info may I provide if reproduced again?
It is reproduced sporadically, sometimes the issue is combined with 100%CPU usage by corosync also required to restart pacemaker/corosync on theaffected node and sometimes all nodes. In this particular case CPU usageis normal.
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Corosync node gets unique Ring ID

Reply via email to