use fence and after you configured the fencing you need to use iptables for testing your cluster, with iptables you can block 5404 and 5405 ports
2016-02-14 14:09 GMT+01:00 Debabrata Pani <debabrata.p...@mobileum.com>: > Hi, > We ran into some problems when we pull down the ethernet interface using > “ifconfig eth0 down” > > Our cluster has the following configurations and resources > > Two network interfaces : eth0 and lo(cal) > 3 nodes with one node put in maintenance mode > No-quorum-policy=stop > Stonith-enabled=false > Postgresql Master/Slave > vip master and vip replication IPs > VIPs will run on the node where Postgresql Master is running > > > Two test cases that we executed are as follows > > Introduce delay in the ethernet interface o f the postgresql PRIMARY node > (Command : tc qdisc add dev eth0 root netem delay 8000ms) > `Ifconfig eth0 down` on the postgresql PRIMARY Node > We expected that both these test cases test for network problems in the > cluster > > > In the first case (ethernet interface delay) > > Cluster is divided into “partition WITH quorum” and “partition WITHOUT > quorum” > Partition WITHOUT quorum shuts down all the services > Partition WITH quorum takes over as Postgresql PRIMARY and VIPs > Everything as expected. Wow ! > > > In the second case (ethernet interface down) > > We see lots of errors like the following . On the node > > Feb 12 14:09:48 corosync [MAIN ] Totem is unable to form a cluster because > of an operating system or network fault. The most common cause of this > message is that the local firewall is configured improperly. > Feb 12 14:09:49 corosync [MAIN ] Totem is unable to form a cluster because > of an operating system or network fault. The most common cause of this > message is that the local firewall is configured improperly. > Feb 12 14:09:51 corosync [MAIN ] Totem is unable to form a cluster because > of an operating system or network fault. The most common cause of this > message is that the local firewall is configured improperly. > > But the `crm_mon –Afr` (from the node whose eth0 is down) always shows the > cluster to be fully formed. > > It shows all the nodes as UP > It shows itself as the one running the postgresql PRIMARY (as was the case > before putting the ethernet interface is down) > > `crm_mon -Afr` on the OTHER nodes show a different story > > They show the other node as down > One of the other two nodes takes over the postgresql PRIMARY > > This leads to a split brain situation which was gracefully avoided in the > test case where only “delay is introduced into the interface” > > > Questions : > > Is it a known issue with pacemaker when the ethernet interface is pulled > down ? > Is it an incorrect way of testing the cluster ? There is some information > regarding the same in this thread > http://www.gossamer-threads.com/lists/linuxha/pacemaker/59738 > > > Regards, > Deba > > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > -- .~. /V\ // \\ /( )\ ^`~'^ _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org