Re: [ClusterLabs] weird corosync - [TOTEM ] FAILED TO RECEIVE

Jan Friesse Fri, 23 Nov 2018 08:36:21 -0800

lejeczek,

On 15/10/2018 07:24, Jan Friesse wrote:
lejeczek,
hi guys,
I have a 3-node cluser(centos 7.5), 2 nodes seems fine but third(orprobably something else in between) is not right.
I see this:

  $ pcs status --all
Cluster name: CC
Stack: corosync
Current DC: whale.private (version 1.1.18-11.el7_5.3-2b07d5c5a9) -partition with quorum
Last updated: Fri Oct 12 15:40:39 2018
Last change: Fri Oct 12 15:14:57 2018 by root via crm_resource onwhale.private
3 nodes configured
8 resources configured (1 DISABLED)

Online: [ rental.private whale.private ]
OFFLINE: [ rider.private ]

and that third node logs:

[TOTEM ] FAILED TO RECEIVE
[TOTEM ] A new membership (10.5.6.100:2504344) was formed. Membersleft: 2 4
  [TOTEM ] Failed to receive the leave message. failed: 2 4
  [QUORUM] Members[1]: 1
  [MAIN  ] Completed service synchronization, ready to provide service.
[TOTEM ] A new membership (10.5.6.49:2504348) was formed. Membersjoined: 2 4
  [TOTEM ] FAILED TO RECEIVE

and it just keeps going like that.
Sometimes reboot(or stop of services + wait + start) of that thirdnode would help.But, I get this situation almost every time a node gets (orderly)shut down or reboot.
Network-wise, connectivity, seem okey. Where to start?
A little more information would be helpful (corosync version, usedprotocol - udpu/udp, corosync.conf, ...), but few possible problems:
- If UDP (multicast) is used, try UDPU
- Check firewall
- Try reduce MTU used by corosync (option netmtu in corosync.conf)

Regards,
  Honza
One thing I remember - could it be that because at the time of clusterformation(and for some time after) one of the nodes had a different rubyversion from what other nodes had?


Probably not, because corosync itself does not have any dependency on ruby.

I cannot remember when this problem started to appear, was if from thebeginning or later, cannot say.
I'm on Centos 7.6. I do not think I use UDP (other then creation of someresources and constrains it's a "vanilla" cluster). I use a


That's why I've asked for config files ;)

"non-default" MTU on the ifaces cluster uses, and also, those interfacesare net-team devices. But still.. why it always be that one node (all

So it's probably really MTU, please try change option netmtu incorosync.conf.

are virtually identical)

Evil is usually hidden in detail so virtually identical may mean it'snot identical enough.


many thanks, L.


Np, but I'm not sure if hints were useful for you or not.

Regards,
  Honza

many thanks, L
_______________________________________________
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


_______________________________________________
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] weird corosync - [TOTEM ] FAILED TO RECEIVE

Reply via email to