On Wed, 2018-10-10 at 12:18 +0200, Simon Bomm wrote: > Le sam. 6 oct. 2018 à 06:13, Andrei Borzenkov <arvidj...@gmail.com> a > écrit : > > 05.10.2018 15:00, Simon Bomm пишет: > > > Hi all, > > > > > > Using pacemaker 1.1.18-11 and mysql resource agent ( > > > https://github.com/ClusterLabs/resource-agents/blob/RHEL6/heartbe > > at/mysql), > > > I run into an unwanted behaviour. My point of view of course, > > maybe it's > > > expected to be as it is that's why I ask. > > > > > > # My test case is the following : > > > > > > Everything is OK on my cluster, crm_mon output is as below (no > > failed > > > actions) > > > > > > Master/Slave Set: ms_mysql-master [ms_mysql] > > > Masters: [ db-master ] > > > Slaves: [ db-slave ] > > > > > > 1. I insert in a table on master, no issue data is replicated. > > > 2. I shut down net int on the master (vm), > > > > First, thanks for taking time to answer me > > > What exactly does it mean? How do you shut down net? > > > > Disconnect the network card from VMWare vSphere Console > > > > pacemaker correctly start on the > > > other node. Master is seen as offline, and db-slave is now master > > > > > > Master/Slave Set: ms_mysql-master [ms_mysql] > > > Masters: [ db-slave ] > > > > > > 3. I bring back my net int up, pacemaker see the node online and > > set the > > > old-master as a the new slave : > > > > > > Master/Slave Set: ms_mysql-master [ms_mysql] > > > Masters: [ db-slave ] > > > Slaves: [ db-master ] > > > > > > 4. From this point, my external monitoring bash script shows that > > SQL and > > > IO thread are not running, but I can't see any error in the pcs > > > status/crm_mon outputs. > > > > Pacemaker just shows what resource agents claim. If resource agent > > claims resource is started, there is nothing pacemaker can do. You > > need > > to debug what resource agent does. > > > > I've debugged it quite a lot, and that's what drove me to isolate > error below : > > > mysql -h localhost -u user-repl -pmysqlreplpw -e "START SLAVE" > > ERROR 1200 (HY000) at line 1: Misconfigured slave: MASTER_HOST was > not set; > > Fix in config file or with CHANGE MASTER TO > > > > > Consequence is that I continue inserting on my new > > > promoted master but the data is never consumed by my former > > master computer. > > > > > > # Questions : > > > > > > - Is this some kind of safety behaviour to avoid data corruption > > when a > > > node is back online ? > > > - When I want to manually start it like ocf does it returns this > > error : > > > > > > mysql -h localhost -u user-repl -pmysqlreplpw -e "START SLAVE" > > > ERROR 1200 (HY000) at line 1: Misconfigured slave: MASTER_HOST > > was not set; > > > Fix in config file or with CHANGE MASTER TO > > > > > > - I would expect the cluster to stop the slave and show a failed > > action, am > > > I wrong here ? > > > > > > > I am not familiar with specific application and its structure. From > > quick browsing monitor action does mostly check for running > > process. Is > > mySQL process running? > > Yes it is, as you mentionned previously the config wants pacemaker to > start mysql resource so no problems. > > > # Other details (not sure it matters a lot) > > > > > > No stonith enabled, no fencing or auto-failback. > > > > How are you going to resolve split-brain without stonith? "Stopping > > net" > > sounds exactly like split brain, in which case further > > investigation is > > rather pointless. > > > > You make the point, as I'm not very familiar with stonithd, I first > disable this to avoid unwanted behaviour but I'll definitely follow > your advise and dig around. > > > Anyway, to give some non-hypothetical answer full configuration and > > logs > > from both systems are needed. > > > > Sure, please find the full configuration > > Cluster Name: app_cluster > Corosync Nodes: > app-central-master app-central-slave app-db-master app-db-slave app- > quorum > Pacemaker Nodes: > app-central-master app-central-slave app-db-master app-db-slave app- > quorum > > Resources: > Master: ms_mysql-master > Meta Attrs: master-node-max=1 clone_max=2 globally-unique=false > clone-node-max=1 notify=true > Resource: ms_mysql (class=ocf provider=heartbeat type=mysql-app) > Attributes: binary=/usr/bin/mysqld_safe > config=/etc/my.cnf.d/server.cnf datadir=/var/lib/mysql > evict_outdated_slaves=false max_slave_lag=15 > pid=/var/lib/mysql/mysql.pid replication_passwd=mysqlreplpw > replication_user=app-repl socket=/var/lib/mysql/mysql.sock > test_passwd=mysqlrootpw test_user=root > Operations: demote interval=0s timeout=120 (ms_mysql-demote- > interval-0s) > monitor interval=20 timeout=30 (ms_mysql-monitor- > interval-20) > monitor interval=10 role=Master timeout=30 (ms_mysql- > monitor-interval-10) > monitor interval=30 role=Slave timeout=30 (ms_mysql- > monitor-interval-30) > notify interval=0s timeout=90 (ms_mysql-notify- > interval-0s) > promote interval=0s timeout=120 (ms_mysql-promote- > interval-0s) > start interval=0s timeout=120 (ms_mysql-start- > interval-0s) > stop interval=0s timeout=120 (ms_mysql-stop-interval- > 0s) > Resource: vip_mysql (class=ocf provider=heartbeat type=IPaddr2-app) > Attributes: broadcast=10.30.255.255 cidr_netmask=16 > flush_routes=true ip=10.30.3.229 nic=ens160 > Operations: monitor interval=10s timeout=20s (vip_mysql-monitor- > interval-10s) > start interval=0s timeout=20s (vip_mysql-start- > interval-0s) > stop interval=0s timeout=20s (vip_mysql-stop-interval- > 0s) > Group: app > Resource: misc_app (class=ocf provider=heartbeat type=misc-app) > Attributes: crondir=/etc/app-failover/resources/cron/,/etc/cron.d/ > Meta Attrs: target-role=started > Operations: monitor interval=5s timeout=20s (misc_app-monitor- > interval-5s) > start interval=0s timeout=20s (misc_app-start- > interval-0s) > stop interval=0s timeout=20s (misc_app-stop-interval- > 0s) > Resource: cbd_central_broker (class=ocf provider=heartbeat > type=cbd-central-broker) > Meta Attrs: target-role=started > Operations: monitor interval=5s timeout=20s (cbd_central_broker- > monitor-interval-5s) > start interval=0s timeout=90s (cbd_central_broker- > start-interval-0s) > stop interval=0s timeout=90s (cbd_central_broker-stop- > interval-0s) > Resource: centcore (class=ocf provider=heartbeat type=centcore) > Meta Attrs: target-role=started > Operations: monitor interval=5s timeout=20s (centcore-monitor- > interval-5s) > start interval=0s timeout=90s (centcore-start- > interval-0s) > stop interval=0s timeout=90s (centcore-stop-interval- > 0s) > Resource: apptrapd (class=ocf provider=heartbeat type=apptrapd) > Meta Attrs: target-role=started > Operations: monitor interval=5s timeout=20s (apptrapd-monitor- > interval-5s) > start interval=0s timeout=90s (apptrapd-start- > interval-0s) > stop interval=0s timeout=90s (apptrapd-stop-interval- > 0s) > Resource: app_central_sync (class=ocf provider=heartbeat type=app- > central-sync) > Meta Attrs: target-role=started > Operations: monitor interval=5s timeout=20s (app_central_sync- > monitor-interval-5s) > start interval=0s timeout=90s (app_central_sync-start- > interval-0s) > stop interval=0s timeout=90s (app_central_sync-stop- > interval-0s) > Resource: snmptrapd (class=ocf provider=heartbeat type=snmptrapd) > Meta Attrs: target-role=started > Operations: monitor interval=5s timeout=20s (snmptrapd-monitor- > interval-5s) > start interval=0s timeout=90s (snmptrapd-start- > interval-0s) > stop interval=0s timeout=90s (snmptrapd-stop-interval- > 0s) > Resource: http (class=ocf provider=heartbeat type=apacheapp) > Meta Attrs: target-role=started > Operations: monitor interval=5s timeout=20s (http-monitor- > interval-5s) > start interval=0s timeout=40s (http-start-interval-0s) > stop interval=0s timeout=60s (http-stop-interval-0s) > Resource: vip_app (class=ocf provider=heartbeat type=IPaddr2-app) > Attributes: broadcast=10.30.255.255 cidr_netmask=16 > flush_routes=true ip=10.30.3.230 nic=ens160 > Meta Attrs: target-role=started > Operations: monitor interval=10s timeout=20s (vip_app-monitor- > interval-10s) > start interval=0s timeout=20s (vip_app-start-interval- > 0s) > stop interval=0s timeout=20s (vip_app-stop-interval- > 0s) > Resource: centengine (class=ocf provider=heartbeat type=centengine) > Meta Attrs: multiple-active=stop_start target-role=started > Operations: monitor interval=5s timeout=20s (centengine-monitor- > interval-5s) > start interval=0s timeout=90s (centengine-start- > interval-0s) > stop interval=0s timeout=90s (centengine-stop- > interval-0s) > > Stonith Devices: > Fencing Levels: > > Location Constraints: > Resource: app > Disabled on: app-db-master (score:-INFINITY) (id:location-app- > app-db-master--INFINITY) > Disabled on: app-db-slave (score:-INFINITY) (id:location-app-app- > db-slave--INFINITY) > Resource: ms_mysql > Disabled on: app-central-master (score:-INFINITY) (id:location- > ms_mysql-app-central-master--INFINITY) > Disabled on: app-central-slave (score:-INFINITY) (id:location- > ms_mysql-app-central-slave--INFINITY) > Resource: vip_mysql > Disabled on: app-central-master (score:-INFINITY) (id:location- > vip_mysql-app-central-master--INFINITY) > Disabled on: app-central-slave (score:-INFINITY) (id:location- > vip_mysql-app-central-slave--INFINITY) > Ordering Constraints: > Colocation Constraints: > vip_mysql with ms_mysql-master (score:INFINITY) (rsc-role:Started) > (with-rsc-role:Master) > ms_mysql-master with vip_mysql (score:INFINITY) (rsc-role:Master) > (with-rsc-role:Started) > Ticket Constraints: > > Alerts: > No alerts defined > > Resources Defaults: > resource-stickiness: INFINITY > Operations Defaults: > No defaults set > > Cluster Properties: > cluster-infrastructure: corosync > cluster-name: app_cluster > dc-version: 1.1.18-11.el7_5.3-2b07d5c5a9 > have-watchdog: false > last-lrm-refresh: 1538740285 > ms_mysql_REPL_INFO: app-db-master|mysql-bin.000012|327 > stonith-enabled: false > symmetric-cluster: true > Node Attributes: > app-quorum: standby=on > > Quorum: > Options: > Device: > votes: 1 > Model: net > algorithm: ffsplit > host: app-quorum > > > Logs are below > > SLAVE when I disconnect interface (node is isolated), and associated > crm_mon, lgtm and can get the behaviour : > > Oct 10 09:20:07 app-db-slave corosync[1055]: [TOTEM ] A processor > failed, forming new configuration. > Oct 10 09:20:11 app-db-slave corosync[1055]: [TOTEM ] A new > membership (10.30.3.245:196) was formed. Members left: 3 > Oct 10 09:20:11 app-db-slave corosync[1055]: [TOTEM ] Failed to > receive the leave message. failed: 3 > Oct 10 09:20:11 app-db-slave corosync[1055]: [QUORUM] Members[4]: 1 2 > 4 5 > Oct 10 09:20:11 app-db-slave corosync[1055]: [MAIN ] Completed > service synchronization, ready to provide service. > Oct 10 09:20:11 app-db-slave cib[1168]: notice: Node app-db-master > state is now lost > Oct 10 09:20:11 app-db-slave attrd[1172]: notice: Node app-db-master > state is now lost > Oct 10 09:20:11 app-db-slave attrd[1172]: notice: Removing all app- > db-master attributes for peer loss > Oct 10 09:20:11 app-db-slave stonith-ng[1170]: notice: Node app-db- > master state is now lost > Oct 10 09:20:11 app-db-slave pacemakerd[1084]: notice: Node app-db- > master state is now lost > Oct 10 09:20:11 app-db-slave crmd[1175]: notice: Node app-db-master > state is now lost > Oct 10 09:20:11 app-db-slave cib[1168]: notice: Purged 1 peer with > id=3 and/or uname=app-db-master from the membership cache > Oct 10 09:20:11 app-db-slave stonith-ng[1170]: notice: Purged 1 peer > with id=3 and/or uname=app-db-master from the membership cache > Oct 10 09:20:11 app-db-slave attrd[1172]: notice: Purged 1 peer with > id=3 and/or uname=app-db-master from the membership cache > Oct 10 09:20:11 app-db-slave crmd[1175]: notice: Result of notify > operation for ms_mysql on app-db-slave: 0 (ok) > Oct 10 09:20:12 app-db-slave mysql-app(ms_mysql)[21165]: INFO: app- > db-slave promote is starting > Oct 10 09:20:12 app-db-slave IPaddr2-app(vip_mysql)[21134]: INFO: > Adding inet address 10.30.3.229/16 with broadcast address > 10.30.255.255 to device ens160 > Oct 10 09:20:12 app-db-slave IPaddr2-app(vip_mysql)[21134]: INFO: > Bringing device ens160 up > Oct 10 09:20:12 app-db-slave IPaddr2-app(vip_mysql)[21134]: INFO: > /usr/libexec/heartbeat/send_arp -i 200 -c 5 -I ens160 -s 10.30.3.229 > 10.30.255.255 > Oct 10 09:20:12 app-db-slave crmd[1175]: notice: Result of start > operation for vip_mysql on app-db-slave: 0 (ok) > Oct 10 09:20:12 app-db-slave lrmd[1171]: notice: > ms_mysql_promote_0:21165:stderr [ Error performing operation: No such > device or address ] > Oct 10 09:20:12 app-db-slave crmd[1175]: notice: Result of promote > operation for ms_mysql on app-db-slave: 0 (ok) > Oct 10 09:20:12 app-db-slave mysql-app(ms_mysql)[21285]: INFO: app- > db-slave This will be the new master, ignoring post-promote > notification. > Oct 10 09:20:12 app-db-slave crmd[1175]: notice: Result of notify > operation for ms_mysql on app-db-slave: 0 (ok) > > > Node app-quorum: standby > Online: [ app-central-master app-central-slave app-db-slave ] > OFFLINE: [ app-db-master ] > > Active resources: > > Master/Slave Set: ms_mysql-master [ms_mysql] > Masters: [ app-db-slave ] > vip_mysql (ocf::heartbeat:IPaddr2-app): Started app-db- > slave > > And logs from the master during its isolation : > > Oct 10 09:23:10 app-db-master corosync[1029]: [MAIN ] Totem is > unable to form a cluster because of an operating system or network > fault (reason: totem is continuously in gather state). The most > common cause of this message is that the local firewall is configured > improperly. > Oct 10 09:23:11 app-db-master corosync[1029]: [MAIN ] Totem is > unable to form a cluster because of an operating system or network > fault (reason: totem is continuously in gather state). The most > common cause of this message is that the local firewall is configured > improperly. > Oct 10 09:23:13 app-db-master corosync[1029]: [MAIN ] Totem is > unable to form a cluster because of an operating system or network > fault (reason: totem is continuously in gather state). The most > common cause of this message is that the local firewall is configured > improperly. > Oct 10 09:23:14 app-db-master corosync[1029]: [MAIN ] Totem is > unable to form a cluster because of an operating system or network > fault (reason: totem is continuously in gather state). The most > common cause of this message is that the local firewall is configured > improperly. > Oct 10 09:23:16 app-db-master corosync[1029]: [MAIN ] Totem is > unable to form a cluster because of an operating system or network > fault (reason: totem is continuously in gather state). The most > common cause of this message is that the local firewall is configured > improperly. > Oct 10 09:23:17 app-db-master corosync[1029]: [MAIN ] Totem is > unable to form a cluster because of an operating system or network > fault (reason: totem is continuously in gather state). The most > common cause of this message is that the local firewall is configured > improperly. > Oct 10 09:23:19 app-db-master corosync[1029]: [MAIN ] Totem is > unable to form a cluster because of an operating system or network > fault (reason: totem is continuously in gather state). The most > common cause of this message is that the local firewall is configured > improperly. > Oct 10 09:23:20 app-db-master corosync[1029]: [MAIN ] Totem is > unable to form a cluster because of an operating system or network > fault (reason: totem is continuously in gather state). The most > common cause of this message is that the local firewall is configured > improperly. > Oct 10 09:23:22 app-db-master corosync[1029]: [MAIN ] Totem is > unable to form a cluster because of an operating system or network > fault (reason: totem is continuously in gather state). The most > common cause of this message is that the local firewall is configured > improperly. > Oct 10 09:23:23 app-db-master corosync[1029]: [MAIN ] Totem is > unable to form a cluster because of an operating system or network > fault (reason: totem is continuously in gather state). The most > common cause of this message is that the local firewall is configured > improperly. > Oct 10 09:23:25 app-db-master corosync[1029]: [MAIN ] Totem is > unable to form a cluster because of an operating system or network > fault (reason: totem is continuously in gather state). The most > common cause of this message is that the local firewall is configured > improperly. > Oct 10 09:23:26 app-db-master corosync[1029]: [MAIN ] Totem is > unable to form a cluster because of an operating system or network > fault (reason: totem is continuously in gather state). The most > common cause of this message is that the local firewall is configured > improperly. > Oct 10 09:23:28 app-db-master corosync[1029]: [MAIN ] Totem is > unable to form a cluster because of an operating system or network > fault (reason: totem is continuously in gather state). The most > common cause of this message is that the local firewall is configured > improperly. > Oct 10 09:23:29 app-db-master corosync[1029]: [MAIN ] Totem is > unable to form a cluster because of an operating system or network > fault (reason: totem is continuously in gather state). The most > common cause of this message is that the local firewall is configured > improperly. > Oct 10 09:23:31 app-db-master kernel: vmxnet3 0000:03:00.0 ens160: > NIC Link is Up 10000 Mbps > Oct 10 09:23:31 app-db-master NetworkManager[692]: <info> > [1539156211.1436] device (ens160): carrier: link connected > Oct 10 09:23:31 app-db-master NetworkManager[692]: <info> > [1539156211.1444] device (ens160): state change: unavailable -> > disconnected (reason 'carrier-changed', sys-iface-state: 'managed') > Oct 10 09:23:31 app-db-master NetworkManager[692]: <info> > [1539156211.1456] policy: auto-activating connection 'ens160' > Oct 10 09:23:31 app-db-master NetworkManager[692]: <info> > [1539156211.1470] device (ens160): Activation: starting connection > 'ens160' (9fe36e64-13ca-40cb-a174-5b4e16b826f4) > Oct 10 09:23:31 app-db-master NetworkManager[692]: <info> > [1539156211.1473] device (ens160): state change: disconnected -> > prepare (reason 'none', sys-iface-state: 'managed') > Oct 10 09:23:31 app-db-master NetworkManager[692]: <info> > [1539156211.1474] manager: NetworkManager state is now CONNECTING > Oct 10 09:23:31 app-db-master NetworkManager[692]: <info> > [1539156211.1479] device (ens160): state change: prepare -> config > (reason 'none', sys-iface-state: 'managed') > Oct 10 09:23:31 app-db-master NetworkManager[692]: <info> > [1539156211.1485] device (ens160): state change: config -> ip-config > (reason 'none', sys-iface-state: 'managed') > Oct 10 09:23:31 app-db-master NetworkManager[692]: <info> > [1539156211.2214] device (ens160): state change: ip-config -> ip- > check (reason 'none', sys-iface-state: 'managed') > Oct 10 09:23:31 app-db-master NetworkManager[692]: <info> > [1539156211.2235] device (ens160): state change: ip-check -> > secondaries (reason 'none', sys-iface-state: 'managed') > Oct 10 09:23:31 app-db-master NetworkManager[692]: <info> > [1539156211.2238] device (ens160): state change: secondaries -> > activated (reason 'none', sys-iface-state: 'managed') > Oct 10 09:23:31 app-db-master NetworkManager[692]: <info> > [1539156211.2240] manager: NetworkManager state is now > CONNECTED_LOCAL > Oct 10 09:23:31 app-db-master NetworkManager[692]: <info> > [1539156211.2554] manager: NetworkManager state is now CONNECTED_SITE > Oct 10 09:23:31 app-db-master NetworkManager[692]: <info> > [1539156211.2555] policy: set 'ens160' (ens160) as default for IPv4 > routing and DNS > Oct 10 09:23:31 app-db-master systemd: Starting Network Manager > Script Dispatcher Service... > Oct 10 09:23:31 app-db-master NetworkManager[692]: <info> > [1539156211.2556] device (ens160): Activation: successful, device > activated. > Oct 10 09:23:31 app-db-master NetworkManager[692]: <info> > [1539156211.2564] manager: NetworkManager state is now > CONNECTED_GLOBAL > Oct 10 09:23:31 app-db-master dbus[686]: [system] Activating via > systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus- > org.freedesktop.nm-dispatcher.service' > Oct 10 09:23:31 app-db-master dbus[686]: [system] Successfully > activated service 'org.freedesktop.nm_dispatcher' > Oct 10 09:23:31 app-db-master systemd: Started Network Manager Script > Dispatcher Service. > Oct 10 09:23:31 app-db-master nm-dispatcher: req:1 'up' [ens160]: new > request (3 scripts) > Oct 10 09:23:31 app-db-master nm-dispatcher: req:1 'up' [ens160]: > start running ordered scripts... > Oct 10 09:23:31 app-db-master nm-dispatcher: req:2 'connectivity- > change': new request (3 scripts) > Oct 10 09:23:31 app-db-master nm-dispatcher: req:2 'connectivity- > change': start running ordered scripts... > Oct 10 09:23:31 app-db-master corosync[1029]: [MAIN ] Totem is > unable to form a cluster because of an operating system or network > fault (reason: totem is continuously in gather state). The most > common cause of this message is that the local firewall is configured > improperly. > Oct 10 09:23:31 app-db-master corosync[1029]: [TOTEM ] The network > interface [10.30.3.247] is now up. > Oct 10 09:23:31 app-db-master corosync[1029]: [TOTEM ] adding new > UDPU member {10.30.3.245} > Oct 10 09:23:31 app-db-master corosync[1029]: [TOTEM ] adding new > UDPU member {10.30.3.246} > Oct 10 09:23:31 app-db-master corosync[1029]: [TOTEM ] adding new > UDPU member {10.30.3.247} > Oct 10 09:23:31 app-db-master corosync[1029]: [TOTEM ] adding new > UDPU member {10.30.3.248} > Oct 10 09:23:31 app-db-master corosync[1029]: [TOTEM ] adding new > UDPU member {10.30.3.249} > > As you can see, node is back online and can communicate again with > other nodes, so pacemaker start mysql as expected and promote it as > slave : > > Node aoo-quorum: standby > Online: [ app-central-master app-central-slave app-db-master app-db- > slave ] > > Active resources: > > Master/Slave Set: ms_mysql-master [ms_mysql] > Masters: [ app-db-slave ] > Slaves: [ app-db-master ] > > Resource-agents oriented logs are below : > > Master : > Oct 10 09:24:01 app-db-master crmd[5177]: notice: Result of demote > operation for ms_mysql on app-db-master: 0 (ok) > Oct 10 09:24:02 app-db-master mysql-app(ms_mysql)[5592]: INFO: app- > db-master Ignoring post-demote notification for my own demotion. > Oct 10 09:24:02 app-db-master crmd[5177]: notice: Result of notify > operation for ms_mysql on app-db-master: 0 (ok) > > Slave: > > Oct 10 09:24:01 app-db-slave crmd[1175]: notice: Result of notify > operation for ms_mysql on app-db-slave: 0 (ok) > Oct 10 09:24:02 app-db-slave mysql-app(ms_mysql)[22969]: INFO: app- > db-slave Ignoring pre-demote notification execpt for my own demotion. > Oct 10 09:24:02 app-db-slave crmd[1175]: notice: Result of notify > operation for ms_mysql on app-db-slave: 0 (ok) > Oct 10 09:24:03 app-db-slave mysql-app(ms_mysql)[22999]: INFO: app- > db-slave post-demote notification for app-db-master. > Oct 10 09:24:03 app-db-slave mysql-app(ms_mysql)[22999]: WARNING: > Attempted to unset the replication master on an instance that is not > configured as a replication slave > Oct 10 09:24:03 app-db-slave crmd[1175]: notice: Result of notify > operation for ms_mysql on app-db-slave: 0 (ok) > > So I expect to have a running replication at this point, but when I > perform SHOW SLAVE STATUS on my new slave, I get an empty response : > > MariaDB [(none)]> SHOW SLAVE STATUS \G > Empty set (0.00 sec) > > MariaDB [(none)]> Ctrl-C -- exit! > Aborted > (reverse-i-search)`ab': systemctl en^Cle corosync > [root@app-db-master ~]# bash /etc/app-failover/mysql-exploit/mysql- > check-status.sh > Connection Status 'app-db-master' [OK] > Connection Status 'app-db-slave' [OK] > Slave Thread Status [KO] > Error reports: > No slave (maybe because we cannot check a server). > Position Status [SKIP] > Error reports: > Skip because we can't identify a unique slave. > > From what I understand the is_slave function from https://github.com/ > ClusterLabs/resource-agents/blob/RHEL6/heartbeat/mysql works as > expected, because as it gets an empty set when performing the monitor > action it does not consider it as a replication slave, so I guess > there is an issue from the issue already presented above and the > CHANGE_MASTER_TO query that failed because of error "ERROR 1200 > (HY000) at line 1: Misconfigured slave: MASTER_HOST was not set;"
That looks accurate. I think this would be worth opening an issue at https://github.com/ClusterLabs/resource-agents/issues so it is at least documented. Not directly helpful, but maybe worth mentioning -- there seems to be more community activity around the pgsql and galera agents. You may have to do more of your own code-diving with mysql. It may be worth comparing the various agents to see how they handle various situations. (Note there is the https://github.com/ClusterLabs/PAF agent as well as the heartbeat pgsql agent.) > I may miss something obvious .. Please tell me if I can bring more > information around my issue. > > Rgds > > > > Symetric cluster > > > configured. > > > > > > Details of my pacemaker resource configuration is > > > > > > Master: ms_mysql-master > > > Meta Attrs: master-node-max=1 clone_max=2 globally-unique=false > > > clone-node-max=1 notify=true > > > Resource: ms_mysql (class=ocf provider=heartbeat type=mysql) > > > Attributes: binary=/usr/bin/mysqld_safe > > config=/etc/my.cnf.d/server.cnf > > > datadir=/var/lib/mysql evict_outdated_slaves=false > > max_slave_lag=15 > > > pid=/var/lib/mysql/mysql.pid replication_passwd=mysqlreplpw > > > replication_user=user-repl socket=/var/lib/mysql/mysql.sock > > > test_passwd=mysqlrootpw test_user=root > > > Operations: demote interval=0s timeout=120 (ms_mysql-demote- > > interval-0s) > > > monitor interval=20 timeout=30 (ms_mysql-monitor- > > interval-20) > > > monitor interval=10 role=Master timeout=30 > > > (ms_mysql-monitor-interval-10) > > > monitor interval=30 role=Slave timeout=30 > > > (ms_mysql-monitor-interval-30) > > > notify interval=0s timeout=90 (ms_mysql-notify- > > interval-0s) > > > promote interval=0s timeout=120 > > > (ms_mysql-promote-interval-0s) > > > start interval=0s timeout=120 (ms_mysql-start- > > interval-0s) > > > stop interval=0s timeout=120 (ms_mysql-stop- > > interval-0s) > > > > > > Any things I'm missing on this ? Did not find a clearly similar > > usecase > > > when googling around network outage and pacemaker. > > > > > > Thanks -- Ken Gaillot <kgail...@redhat.com> _______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org