[ClusterLabs] MySQL Cluster: Strange behaviour when forcing movement of resources

Félix Díaz de Rada Thu, 16 Feb 2017 00:30:07 -0800


Hi all,

We are currently setting up a MySQL cluster (Master-Slave) over thisplatform:

- Two nodes, on RHEL 7.0
- pacemaker-1.1.10-29.el7.x86_64
- corosync-2.3.3-2.el7.x86_64
- pcs-0.9.115-32.el7.x86_64
There is a IP address resource to be used as a "virtual IP".

This is configuration of cluster:

Cluster Name: webmobbdprep
Corosync Nodes:
 webmob1bdprep-ges webmob2bdprep-ges
Pacemaker Nodes:
 webmob1bdprep-ges webmob2bdprep-ges

Resources:
 Group: G_MySQL_M
  Meta Attrs: priority=100
  Resource: MySQL_M (class=ocf provider=heartbeat type=mysql_m)

Attributes:binary=/opt/mysql/mysql-5.7.17-linux-glibc2.5-x86_64/bin/mysqld_safeconfig=/data/webmob_prep/webmob_prep.cnf datadir=/data/webmob_preplog=/data/webmob_prep/webmob_prep.errpid=/data/webmob_prep/webmob_rep.pidsocket=/data/webmob_prep/webmob_prep.sock user=mysql group=mysqltest_table=replica.pacemaker_test test_user=root

   Meta Attrs: resource-stickiness=1000

Operations: promote interval=0s timeout=120(MySQL_M-promote-timeout-120)

               demote interval=0s timeout=120 (MySQL_M-demote-timeout-120)

start interval=0s timeout=120s on-fail=restart(MySQL_M-start-timeout-120s-on-fail-restart)

               stop interval=0s timeout=120s (MySQL_M-stop-timeout-120s)

monitor interval=60s timeout=30s OCF_CHECK_LEVEL=1(MySQL_M-monitor-interval-60s-timeout-30s)

  Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
   Attributes: ip=172.18.64.44 nic=ens160:1 cidr_netmask=32

Meta Attrs: target-role=Started migration-threshold=3failure-timeout=60s

   Operations: start interval=0s timeout=20s (ClusterIP-start-timeout-20s)
               stop interval=0s timeout=20s (ClusterIP-stop-timeout-20s)
               monitor interval=60s (ClusterIP-monitor-interval-60s)
 Resource: MySQL_S (class=ocf provider=heartbeat type=mysql_s)

  Meta Attrs: resource-stickiness=0
  Operations: promote interval=0s timeout=120 (MySQL_S-promote-timeout-120)
              demote interval=0s timeout=120 (MySQL_S-demote-timeout-120)

start interval=0s timeout=120s on-fail=restart(MySQL_S-start-timeout-120s-on-fail-restart)

              stop interval=0s timeout=120s (MySQL_S-stop-timeout-120s)

monitor interval=60s timeout=30s OCF_CHECK_LEVEL=1(MySQL_S-monitor-interval-60s-timeout-30s)


Stonith Devices:
Fencing Levels:

Location Constraints:
Ordering Constraints:

start MySQL_M then start ClusterIP (Mandatory)(id:order-MySQL_M-ClusterIP-mandatory)start G_MySQL_M then start MySQL_S (Mandatory)(id:order-G_MySQL_M-MySQL_S-mandatory)

Colocation Constraints:
  G_MySQL_M with MySQL_S (-100) (id:colocation-G_MySQL_M-MySQL_S-INFINITY)

Cluster Properties:
 cluster-infrastructure: corosync
 dc-version: 1.1.10-29.el7-368c726
 last-lrm-refresh: 1487148812
 no-quorum-policy: ignore
 stonith-enabled: false

Pacemaker works as expected under most of situations, but there is onescenario that is really not understable to us. I will try to describe it:

a - Master resource (and Cluster IP address) are active on node 1 andSlave resource is active on node 2.

b - We force movement of Master resource to node 2.
c - Pacemaker stops all resources: Master, Slave and Cluster IP.

d - Master resource and Cluster IP are started on node 2 (this is OK),but Slave also tries to start (??). It fails (logically, because Masterresource has been started on the same node), it logs an "unknown error"and its state is marked as "failed". This is a capture of 'pcs status'at that point:


OFFLINE: [ webmob1bdprep-ges ]
Online: [ webmob2bdprep-ges ]

Full list of resources:

Resource Group: G_MySQL_M
    MySQL_M (ocf::heartbeat:mysql_m): Started webmob2bdprep-ges
    ClusterIP (ocf::heartbeat:IPaddr2): Started webmob2bdprep-ges
MySQL_S (ocf::heartbeat:mysql_s): FAILED webmob2bdprep-ges

Failed actions:

MySQL_M_monitor_60000 on webmob2bdprep-ges 'master' (8): call=62,status=complete, last-rc-change='Wed Feb 15 11:54:08 2017', queued=0ms,exec=0msMySQL_S_start_0 on webmob2bdprep-ges 'unknown error' (1): call=78,status=complete, last-rc-change='Wed Feb 15 11:54:17 2017', queued=40ms,exec=0ms


PCSD Status:
webmob1bdprep-ges: Offline
webmob2bdprep-ges: Online

e - Pacemaker moves Slave resource to node 1 and starts it. Now we haveboth resources started again, Master on node 2 and Slave on node 1.

f - One minute later, Pacemaker restarts both resources (???).

So we are wondering:

- After the migration of the Master resource, why Pacemaker tries tostart Slave resource on the same node where Master resource has beenstarted previously? Why is trying to do it even with a colocationrestriction as everyone can see on our configuration?- Once Slave resource has been pushed out to the other node (andstarted), why both resources are restarted again one minute later?

Maybe there is something on our configuration that is the cause for it,but we can not figure out what could be. Also, we have been working withother Pacemaker cluster (also for MySQL) for the last three years, usinga very similar configuration and we have not ever seen a behaviour asdescribed before. Only difference between former cluster and this newone is that the older one is on a RHEL 6 platform and it runs cman andnot corosync.

So we would thank any help or remark about why our cluster is doingthis. Although it is not such a critical problem, it could be an issueon a production environment.


Best Regards,





--
/*Félix Díaz de Rada*
fd...@gfi.es <mailto:em...@gfi.es>
//—/
*Gfi Norte* *| * *www.gfi.es* <http://www.gfi.es>
/Calle Licenciado Poza 55, Planta 2
48013 Bilbao Bizkaia
Teléfono: +34 94 424 18 25/

Síguenos: Blog <http://blog.gfi.es/> | Facebook<https://www.facebook.com/gfiinformatica> | LinkedIn<http://www.linkedin.com/company/gfi-inform-tica> | Twitter<https://twitter.com/GFI_Informatica> | YouTube<http://www.youtube.com/user/GFIInformatica>

Buscamos continuamente talento: http://www.gfi.es/carreras //

_______________________________________________
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] MySQL Cluster: Strange behaviour when forcing movement of resources

Reply via email to