>>> Ken Gaillot <kgail...@redhat.com> schrieb am 24.06.2019 um 16:57 in Nachricht <95f51b52283d05bbdcccc948e4508c406d7ccb64.ca...@redhat.com>: > On Mon, 2019‑06‑24 at 08:52 +0200, Jan Friesse wrote: >> Somanath, >> >> > Hi All, >> > >> > I have a two node cluster with multicast (udp) transport . The >> > multicast IP used in 224.1.1.1 . >> >> Would you mind to give a try to UDPU (unicast)? For two node cluster >> there is going to be no difference in terms of speed/throughput. >> >> > >> > Whenever there is a CPU intensive task the pcs cluster goes into >> > split brain scenario and doesn't recover automatically . We have to > > In addition to others' comments: if fencing is enabled, split brain > should not be possible. Automatic recovery should work as long as
---unless the fencing was caused by a persistent communication problem... > fencing succeeds. With fencing disabled, split brain with no automatic > recovery can definitely happen. > >> > do a manual restart of services to bring both nodes online again. >> >> Before the nodes goes into split brain , the corosync log shows , >> > >> > May 24 15:10:02 server1 corosync[4745]: [TOTEM ] Retransmit List: >> > 7c 7e >> > May 24 15:10:02 server1 corosync[4745]: [TOTEM ] Retransmit List: >> > 7c 7e >> > May 24 15:10:02 server1 corosync[4745]: [TOTEM ] Retransmit List: >> > 7c 7e >> > May 24 15:10:02 server1 corosync[4745]: [TOTEM ] Retransmit List: >> > 7c 7e >> > May 24 15:10:02 server1 corosync[4745]: [TOTEM ] Retransmit List: >> > 7c 7e >> >> This is usually happening when: >> ‑ multicast is somehow rate‑limited on switch side >> (configuration/bad >> switch implementation/...) >> ‑ MTU of network is smaller than 1500 bytes and fragmentation is not >> allowed ‑> try reduce totem.netmtu >> >> Regards, >> Honza >> >> >> > May 24 15:51:42 server1 corosync[4745]: [TOTEM ] A processor >> > failed, forming new configuration. >> > May 24 16:41:42 server1 corosync[4745]: [TOTEM ] A new membership >> > (10.241.31.12:29276) was formed. Members left: 1 >> > May 24 16:41:42 server1 corosync[4745]: [TOTEM ] Failed to receive >> > the leave message. failed: 1 >> > >> > Is there any way we can overcome this or this may be due to any >> > multicast issues in the network side. >> > >> > With Regards >> > Somanath Thilak J >> > >> > >> > >> > >> > >> > >> > >> > _______________________________________________ >> > Manage your subscription: >> > https://lists.clusterlabs.org/mailman/listinfo/users >> > >> > ClusterLabs home: https://www.clusterlabs.org/ >> > >> >> _______________________________________________ >> Manage your subscription: >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> ClusterLabs home: https://www.clusterlabs.org/ > ‑‑ > Ken Gaillot <kgail...@redhat.com> > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/