Thanks, that seems to have been the problem in my case. (For some reason the attribute did not reappear on its own, but adding it manually w/ crm_attribute did work).
I assume that this happened since I didn't have another node that could become the DC while restarting pacemaker? If I do add another node then the problem doesn't seem to appear. Dirk On Wed, Jun 5, 2019 at 3:17 PM Ken Gaillot <kgail...@redhat.com> wrote: > On Wed, 2019-06-05 at 13:28 -0700, Dirk Gassen wrote: > > Thanks for your quick reply. I should have been a bit more verbose in > > my problem description. > > > > After starting up pacemaker again and before "crm node testras3 > > ready" I did actually monitor the cluster with "crm_mon" and waited > > until it indicated that it knew about the states of the resources. > > > > Here is actually the excerpt from syslog: > > * crm node maintenance testras3 > > > 16:14:50 On loss of CCM Quorum: Ignore > > > 16:14:50 Forcing unmanaged master MariaDB:0 to remain promoted on > > testras3 > > > 16:14:50 Calculated Transition 12: /var/lib/pacemaker/pengine/pe- > > input-72.bz2 > > * systemctl stop pacemaker > > > 16:15:29 On loss of CCM Quorum: Ignore > > > 16:15:29 Forcing unmanaged master MariaDB:0 to remain promoted on > > testras3 > > Ah, there is no master score for MariaDB, so when the node leaves > maintenance mode, the resource must be demoted. > > Restarting pacemaker clears all transient node attributes (including > the master score). The next monitor would set it again, but maintenance > mode cancels monitors, so it won't run until it comes out of > maintenance mode, at which point it wants to do the demote. > > A good way around this would be to unmanage the MariaDB resource before > putting the node in maintenance. When you take the node out of > maintenance, the monitor will start up again, but it won't take any > actions. Once the monitor runs and sets the master score (which you can > confirm with crm_master --query --resource MariaDB --node <node>), you > can manage the resource. > > > > 16:15:29 Scheduling Node testras3 for shutdown > > > 16:15:29 Calculated Transition 13: /var/lib/pacemaker/pengine/pe- > > input-73.bz2 > > > 16:15:29 Invoking handler for signal 15: Terminated > > * systemctl start pacemaker > > > 16:15:57 Additional logging available in /var/log/pacemaker.log > > > 16:16:20 On loss of CCM Quorum: Ignore > > > 16:16:20 Calculated Transition 0: /var/lib/pacemaker/pengine/pe- > > input-74.bz2 > > > 16:16:20 On loss of CCM Quorum: Ignore > > > 16:16:20 Forcing unmanaged master MariaDB:0 to remain promoted on > > testras3 > > > 16:16:20 Calculated Transition 1: /var/lib/pacemaker/pengine/pe- > > input-75.bz2 > > * crm node ready testras3 > > > 16:18:01 On loss of CCM Quorum: Ignore > > > 16:18:01 Stop AppserverIP#011(testras3) > > > 16:18:01 Demote MariaDB:0#011(Master -> Slave testras3) > > > 16:18:01 Calculated Transition 2: /var/lib/pacemaker/pengine/pe- > > input-76.bz2 > > > 16:18:01 On loss of CCM Quorum: Ignore > > > 16:18:01 Start AppserverIP#011(testras3) > > > 16:18:01 Promote MariaDB:0#011(Slave -> Master testras3) > > > 16:18:01 Calculated Transition 3: /var/lib/pacemaker/pengine/pe- > > input-77.bz2 > > > 16:18:02 On loss of CCM Quorum: Ignore > > > 16:18:02 Calculated Transition 4: /var/lib/pacemaker/pengine/pe- > > input-78.bz2 > > > > So it looks like to me that the cluster is demoting ms_MariaDB from > > Master to Slave. I'm not sure if I should have waited for something > > else to occur? > > > > I have attached pe-input-76.bz2. > > > > Dirk > > > > On Wed, Jun 5, 2019 at 10:22 AM Ken Gaillot <kgail...@redhat.com> > > wrote: > > > On Wed, 2019-06-05 at 07:40 -0700, Dirk Gassen wrote: > > > > Hi, > > > > > > > > I have the following CIB: > > > > > primitive AppserverIP IPaddr \ > > > > > params ip=10.1.8.70 cidr_netmask=255.255.255.192 > > > nic=eth0 \ > > > > > op monitor interval=30s > > > > > primitive MariaDB mysql \ > > > > > params binary="/usr/bin/mysqld_safe" > > > > pid="/var/run/mysqld/mysqld.pid" > > > socket="/var/run/mysqld/mysqld.sock" > > > > replication_user=repl replication_passwd="r3plic@tion" > > > > max_slave_lag=15 evict_outdated_slaves=false test_user=repl > > > > test_passwd="r3plic@tion" config="/etc/mysql/my.cnf" user=mysql > > > > group=mysql datadir="/opt/mysql" \ > > > > > op monitor interval=27s role=Master OCF_CHECK_LEVEL=1 \ > > > > > op monitor interval=35s timeout=30 role=Slave > > > > OCF_CHECK_LEVEL=1 \ > > > > > op start interval=0 timeout=130 \ > > > > > op stop interval=0 timeout=130 > > > > > ms ms_MariaDB MariaDB \ > > > > > meta master-max=1 master-node-max=1 clone-node-max=1 > > > > notify=true globally-unique=false target-role=Started is- > > > managed=true > > > > > colocation colo_sm_aip inf: AppserverIP:Started > > > ms_MariaDB:Master > > > > > > > > When I do "crm node testras3 maintenance && systemctl stop > > > pacemaker > > > > && systemctl start pacemaker && crm node testras3 ready" the > > > cluster > > > > decides to demote ms_MariaDB and (because of the colocation) to > > > stop > > > > AppserverIP. it then follows up immediately with promoting > > > ms_MariaDB > > > > and starting AppserverIP again. > > > > > > > > If I leave out restarting pacemaker the cluster does not demote > > > > ms_MariaDB and AppserverIP is left running. > > > > > > > > Why is the demotion happening and is there a way to avoid this? > > > > > > It looks like there isn't enough time between starting pacemaker > > > and > > > taking the node out of maintenance for pacemaker to re-detect the > > > state > > > of all resources. It's best to do that manually, i.e. wait for the > > > status output to show all the resources again, but you could > > > automate > > > it with a fixed sleep or maybe a brief sleep plus crm_resource -- > > > wait. > > > > > > > Corosync 2.3.5-3ubuntu2.3 and Pacemaker 1.1.14-2ubuntu1.6 > > > > > > > > Sincerely, > > > > Dirk > > > > -- > > > > Dirk Gassen > > > > Senior Software Engineer | GetWellNetwork > > > > o: 240.482.3146 > > > > e: dgas...@getwellnetwork.com > > > > To help people take an active role in their health journey > > > -- > > > Ken Gaillot <kgail...@redhat.com> > > > > > > _______________________________________________ > > > Manage your subscription: > > > https://lists.clusterlabs.org/mailman/listinfo/users > > > > > > ClusterLabs home: https://www.clusterlabs.org/ > > > > > > _______________________________________________ > > Manage your subscription: > > https://lists.clusterlabs.org/mailman/listinfo/users > > > > ClusterLabs home: https://www.clusterlabs.org/ > -- > Ken Gaillot <kgail...@redhat.com> > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > -- Dirk Gassen Senior Software Engineer | GetWellNetwork o: 240.482.3146 e: dgas...@getwellnetwork.com <bnigm...@getwellnetwork.com> To help people take an active role in their health journey
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/