Hi all, Using pacemaker 1.1.18-11 and mysql resource agent ( https://github.com/ClusterLabs/resource-agents/blob/RHEL6/heartbeat/mysql), I run into an unwanted behaviour. My point of view of course, maybe it's expected to be as it is that's why I ask.
# My test case is the following : Everything is OK on my cluster, crm_mon output is as below (no failed actions) Master/Slave Set: ms_mysql-master [ms_mysql] Masters: [ db-master ] Slaves: [ db-slave ] 1. I insert in a table on master, no issue data is replicated. 2. I shut down net int on the master (vm), pacemaker correctly start on the other node. Master is seen as offline, and db-slave is now master Master/Slave Set: ms_mysql-master [ms_mysql] Masters: [ db-slave ] 3. I bring back my net int up, pacemaker see the node online and set the old-master as a the new slave : Master/Slave Set: ms_mysql-master [ms_mysql] Masters: [ db-slave ] Slaves: [ db-master ] 4. From this point, my external monitoring bash script shows that SQL and IO thread are not running, but I can't see any error in the pcs status/crm_mon outputs. Consequence is that I continue inserting on my new promoted master but the data is never consumed by my former master computer. # Questions : - Is this some kind of safety behaviour to avoid data corruption when a node is back online ? - When I want to manually start it like ocf does it returns this error : mysql -h localhost -u user-repl -pmysqlreplpw -e "START SLAVE" ERROR 1200 (HY000) at line 1: Misconfigured slave: MASTER_HOST was not set; Fix in config file or with CHANGE MASTER TO - I would expect the cluster to stop the slave and show a failed action, am I wrong here ? # Other details (not sure it matters a lot) No stonith enabled, no fencing or auto-failback. Symetric cluster configured. Details of my pacemaker resource configuration is Master: ms_mysql-master Meta Attrs: master-node-max=1 clone_max=2 globally-unique=false clone-node-max=1 notify=true Resource: ms_mysql (class=ocf provider=heartbeat type=mysql) Attributes: binary=/usr/bin/mysqld_safe config=/etc/my.cnf.d/server.cnf datadir=/var/lib/mysql evict_outdated_slaves=false max_slave_lag=15 pid=/var/lib/mysql/mysql.pid replication_passwd=mysqlreplpw replication_user=user-repl socket=/var/lib/mysql/mysql.sock test_passwd=mysqlrootpw test_user=root Operations: demote interval=0s timeout=120 (ms_mysql-demote-interval-0s) monitor interval=20 timeout=30 (ms_mysql-monitor-interval-20) monitor interval=10 role=Master timeout=30 (ms_mysql-monitor-interval-10) monitor interval=30 role=Slave timeout=30 (ms_mysql-monitor-interval-30) notify interval=0s timeout=90 (ms_mysql-notify-interval-0s) promote interval=0s timeout=120 (ms_mysql-promote-interval-0s) start interval=0s timeout=120 (ms_mysql-start-interval-0s) stop interval=0s timeout=120 (ms_mysql-stop-interval-0s) Any things I'm missing on this ? Did not find a clearly similar usecase when googling around network outage and pacemaker. Thanks
_______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org