[ClusterLabs] Rolling upgrade from Corosync 2.3+ to Corosync 2.99+ or Corosync 3.0+?

2020-06-10 Thread Vitaly Zolotusky
Hello everybody. We are trying to do a rolling upgrade from Corosync 2.3.5-1 to Corosync 2.99+. It looks like they are not compatible and we are getting messages like: Jun 11 02:10:20 d21-22-left corosync[6349]: [TOTEM ] Message received from 172.18.52.44 has bad magic number (probably sent by

Re: [ClusterLabs] New user needs some help stabilizing the cluster

2020-06-10 Thread Strahil Nikolov
What is your corosync.conf timeouts (especially token & consensus)? Last time I did live migration of RHEL 7 node with the default values, the cluster fenced it - thus I set it to 10s for token and I also raised the consensus (check 'man corosync.conf') above the default. Also, start your inv

Re: [ClusterLabs] New user needs some help stabilizing the cluster

2020-06-10 Thread Howard
Hi everyone. As a followup, I found that the vms were having snapshot backup at the time of the disconnects which I think freezes IO. We'll be addressing that. Is there anything else in the log that can be improved. Thanks, Howard On Wed, Jun 10, 2020 at 10:06 AM Howard wrote: > Good morning.

[ClusterLabs] New user needs some help stabilizing the cluster

2020-06-10 Thread Howard
Good morning. Thanks for reading. We have a requirement to provide high availability for PostgreSQL 10. I have built a two node cluster with a quorum device as the third vote, all running on RHEL 8. Here are the versions installed: [postgres@srv2 cluster]$ rpm -qa|grep "pacemaker\|pcs\|corosync

Re: [ClusterLabs] Redudant Ring Network failure

2020-06-10 Thread ROHWEDER-NEUBECK, MICHAEL (EXTERN)
Hi, yesterday we restart all cluster and all rings ok. Now today 1. With broken ring. ring 0 broken: 033 this is my cfg [root@lvm-nfscpdata-05ct::~]# less /etc/corosync/corosync.conf totem { version: 2 transport: knet cluster_name:

Re: [ClusterLabs] Redudant Ring Network failure

2020-06-10 Thread ROHWEDER-NEUBECK, MICHAEL (EXTERN)
Jan, actually we using this. [root@lvm-nfscpdata-05ct::~ 100 ]# apt show corosync Package: corosync Version: 3.0.1-2+deb10u1 [root@lvm-nfscpdata-05ct::~]# apt show libknet1 Package: libknet1 Version: 1.8-2 This are the newest version provided on Mirror. Sitz der Gesellschaft / Corporate He

Re: [ClusterLabs] Redudant Ring Network failure

2020-06-10 Thread Jan Friesse
Michael, what version of knet you are using? We had quite a few problems with older versions of knet, so current stable is recommended (1.16). Same applies for corosync because 3.0.4 has vastly improved display of links status. Hello, We have massive problems with the redundant ring operatio