> ------------------------------ > > Message: 2 > Date: Mon, 12 Oct 2015 10:06:01 +0200 > From: Jan Friesse <jfrie...@redhat.com> > To: Cluster Labs - All topics related to open-source clustering > welcomed <users@clusterlabs.org> > Subject: Re: [ClusterLabs] continous QUORUM messages in a 3-node > cluster > Message-ID: <561b69e9.1060...@redhat.com> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Illia, > > >> Hi, >> >> We are using a 3-node pacemaker/corosync cluster on CentOs 7. >> We have several identical setups in our QA/DEV orgs, and a couple of them >> continuously spew the following messages on all 3 nodes: >> >> Oct 8 17:18:20 42-hw-rig4-L3-2 corosync[15105]: [TOTEM ] A new membership >> (10.1.13.134:1553572) was formed. Members >> Oct 8 17:18:20 42-hw-rig4-L3-2 corosync[15105]: [QUORUM] Members[3]: 2 1 3 >> Oct 8 17:18:20 42-hw-rig4-L3-2 corosync[15105]: [MAIN ] Completed service >> synchronization, ready to provide service. >> Oct 8 17:18:22 42-hw-rig4-L3-2 corosync[15105]: [TOTEM ] A new membership >> (10.1.13.134:1553576) was formed. Members >> Oct 8 17:18:22 42-hw-rig4-L3-2 corosync[15105]: [QUORUM] Members[3]: 2 1 3 >> Oct 8 17:18:22 42-hw-rig4-L3-2 corosync[15105]: [MAIN ] Completed service >> synchronization, ready to provide service. >> Oct 8 17:18:24 42-hw-rig4-L3-2 corosync[15105]: [TOTEM ] A new membership >> (10.1.13.134:1553580) was formed. Members >> Oct 8 17:18:24 42-hw-rig4-L3-2 corosync[15105]: [QUORUM] Members[3]: 2 1 3 >> Oct 8 17:18:24 42-hw-rig4-L3-2 corosync[15105]: [MAIN ] Completed service >> synchronization, ready to provide service. >> Oct 8 17:18:26 42-hw-rig4-L3-2 corosync[15105]: [TOTEM ] A new membership >> (10.1.13.134:1553584) was formed. Members >> Oct 8 17:18:26 42-hw-rig4-L3-2 corosync[15105]: [QUORUM] Members[3]: 2 1 3 >> Oct 8 17:18:26 42-hw-rig4-L3-2 corosync[15105]: [MAIN ] Completed service >> synchronization, ready to provide service. >> >> The cluster seems to be generally happy: >> >> root@42-hw-rig4-L3-2 ~]# pcs cluster status >> Cluster Status: >> Last updated: Thu Oct 8 17:24:02 2015 >> Last change: Thu Oct 8 16:46:57 2015 >> Stack: corosync >> Current DC: dq-ceph9.clearsky-data.net (3) - partition with quorum >> Version: 1.1.12-a14efad >> 3 Nodes configured >> 17 Resources configured >> >> PCSD Status: >> 42-hw-back-1.clearsky-data.net: Online >> 41-hw-back-1.clearsky-data.net: Online >> dq-ceph9.clearsky-data.net: Online >> >> The corosync config is: >> >> totem { >> version: 2 >> secauth: off >> cluster_name: L3_cluster >> transport: udpu >> } >> >> nodelist { >> node { >> ring0_addr: 42-hw-back-1.clearsky-data.net >> nodeid: 1 >> } >> node { >> ring0_addr: 41-hw-back-1.clearsky-data.net >> nodeid: 2 >> } >> node { >> ring0_addr: dq-ceph9.clearsky-data.net >> nodeid: 3 >> } >> } >> >> quorum { >> provider: corosync_votequorum >> >> } >> >> logging { >> to_syslog: yes >> debug : off >> } >> >> What do these messages mean, and how can we stop them? >> >> Any help would be very appreciated. >> >> Ilia Sokolinski >> >> PS >> >> I have tried to enable corosync debug and got the following logs: >> >> Oct 8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [CFG ] Config reload >> requested by node 1 >> Oct 8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [QB ] HUP conn >> (15105-46913-25) >> Oct 8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [QB ] >> qb_ipcs_disconnect(15105-46913-25) state:2 >> Oct 8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [QB ] epoll_ctl(del): >> Bad file descriptor (9) >> Oct 8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [MAIN ] >> cs_ipcs_connection_closed() >> Oct 8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [MAIN ] >> cs_ipcs_connection_destroyed() >> Oct 8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [QB ] Free'ing >> ringbuffer: /dev/shm/qb-cfg-response-15105-46913-25-header >> Oct 8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [QB ] Free'ing >> ringbuffer: /dev/shm/qb-cfg-event-15105-46913-25-header >> Oct 8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [QB ] Free'ing >> ringbuffer: /dev/shm/qb-cfg-request-15105-46913-25-header >> Oct 8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [VOTEQ ] got nodeinfo >> message from cluster node 3 >> Oct 8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [VOTEQ ] nodeinfo >> message[3]: votes: 1, expected: 3 flags: 1 >> Oct 8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [VOTEQ ] flags: quorate: >> Yes Leaving: No WFA Status: No First: No Qdevice: No QdeviceAlive: No >> QdeviceCastVote: No QdeviceMasterWins: No >> Oct 8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [VOTEQ ] got nodeinfo >> message from cluster node 3 >> Oct 8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [VOTEQ ] nodeinfo >> message[0]: votes: 0, expected: 0 flags: 0 >> Oct 8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [VOTEQ ] got nodeinfo >> message from cluster node 2 >> Oct 8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [VOTEQ ] nodeinfo >> message[2]: votes: 1, expected: 3 flags: 1 >> Oct 8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [VOTEQ ] flags: quorate: >> Yes Leaving: No WFA Status: No First: No Qdevice: No QdeviceAlive: No >> QdeviceCastVote: No QdeviceMasterWins: No >> Oct 8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [VOTEQ ] got nodeinfo >> message from cluster node 2 >> Oct 8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [VOTEQ ] nodeinfo >> message[0]: votes: 0, expected: 0 flags: 0 >> Oct 8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [VOTEQ ] got nodeinfo >> message from cluster node 1 >> Oct 8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [VOTEQ ] nodeinfo >> message[1]: votes: 1, expected: 3 flags: 1 >> Oct 8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [VOTEQ ] flags: quorate: >> Yes Leaving: No WFA Status: No First: No Qdevice: No QdeviceAlive: No >> QdeviceCastVote: No QdeviceMasterWins: No >> Oct 8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [VOTEQ ] total_votes=3, >> expected_votes=3 >> Oct 8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [VOTEQ ] node 1 state=1, >> votes=1, expected=3 >> Oct 8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [VOTEQ ] node 2 state=1, >> votes=1, expected=3 >> Oct 8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [VOTEQ ] node 3 state=1, >> votes=1, expected=3 >> Oct 8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [VOTEQ ] lowest node id: 1 >> us: 1 >> Oct 8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [VOTEQ ] highest node id: 3 >> us: 1 >> Oct 8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [VOTEQ ] got nodeinfo >> message from cluster node 1 >> Oct 8 16:18:47 42-hw-rig4-L3-2 corosync[15105]: [VOTEQ ] nodeinfo >> message[0]: votes: 0, expected: 0 flags: 0 >> Oct 8 16:18:48 42-hw-rig4-L3-2 corosync[15105]: [TOTEM ] entering GATHER >> state from 9(merge during operational state). >> Oct 8 16:18:49 42-hw-rig4-L3-2 corosync[15105]: [QB ] IPC credentials >> authenticated (15105-46923-25) >> Oct 8 16:18:49 42-hw-rig4-L3-2 corosync[15105]: [QB ] connecting to >> client [46923] >> Oct 8 16:18:49 42-hw-rig4-L3-2 corosync[15105]: [QB ] shm size:1048589; >> real_size:1052672; rb->word_size:263168 >> Oct 8 16:18:49 42-hw-rig4-L3-2 corosync[15105]: [QB ] shm size:1048589; >> real_size:1052672; rb->word_size:263168 >> Oct 8 16:18:49 42-hw-rig4-L3-2 corosync[15105]: [QB ] shm size:1048589; >> real_size:1052672; rb->word_size:263168 >> Oct 8 16:18:49 42-hw-rig4-L3-2 corosync[15105]: [MAIN ] connection created >> Oct 8 16:18:49 42-hw-rig4-L3-2 corosync[15105]: [CMAP ] lib_init_fn: >> conn=0x560430414560 >> Oct 8 16:18:49 42-hw-rig4-L3-2 corosync[15105]: [QB ] HUP conn >> (15105-46923-25) >> Oct 8 16:18:49 42-hw-rig4-L3-2 corosync[15105]: [QB ] >> qb_ipcs_disconnect(15105-46923-25) state:2 >> Oct 8 16:18:49 42-hw-rig4-L3-2 corosync[15105]: [QB ] epoll_ctl(del): >> Bad file descriptor (9) >> Oct 8 16:18:49 42-hw-rig4-L3-2 corosync[15105]: [MAIN ] >> cs_ipcs_connection_closed() >> Oct 8 16:18:49 42-hw-rig4-L3-2 corosync[15105]: [CMAP ] exit_fn for >> conn=0x560430414560 >> Oct 8 16:18:49 42-hw-rig4-L3-2 corosync[15105]: [MAIN ] >> cs_ipcs_connection_destroyed() >> Oct 8 16:18:49 42-hw-rig4-L3-2 corosync[15105]: [QB ] Free'ing >> ringbuffer: /dev/shm/qb-cmap-response-15105-46923-25-header >> Oct 8 16:18:49 42-hw-rig4-L3-2 corosync[15105]: [QB ] Free'ing >> ringbuffer: /dev/shm/qb-cmap-event-15105-46923-25-header >> Oct 8 16:18:49 42-hw-rig4-L3-2 corosync[15105]: [QB ] Free'ing >> ringbuffer: /dev/shm/qb-cmap-request-15105-46923-25-header > > Line "entering GATHER state from 9(merge during operational state)". is > interesting. I believe you have different cluster on the network where > you forgot ip addr of one of node in affected cluster. As a simple test, > you can try to add > > interface { > ringnumber: 0 > mcastport: 5409 > } > > into corosync.conf totem section on all nodes in one of affected cluster > and see if problem disappear. > > Honza
Thanks you very much! I found a couple of old nodes which still were configured to be members of this cluster, and destroyed pcs config on them (pcs cluster destroy —force) The log stopped immediatly. Ilia _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org