12.01.2018 01:15, Lars Ellenberg пишет: > > To understand some weird behavior we observed, > I dumbed down a production config to three dummy resources, > while keeping some descriptive resource ids (ip, drbd, fs). > > For some reason, the constraints are: > stuff, more stuff, IP -> DRBD -> FS -> other stuff. > (In the actual real-world config, it makes somewhat more sense, > but it reproduces with just these three resources) > > All is running just fine. > > Online: [ ava emma ] > virtual_ip (ocf::pacemaker:Dummy): Started ava > Master/Slave Set: ms_drbd_r0 [p_drbd_r0] > Masters: [ ava ] > p_fs_drbd1 (ocf::pacemaker:Dummy): Started ava > > If I simulate a monitor failure on IP: > # crm_simulate -L -i virtual_ip_monitor_30000@ava=1 > > Transition Summary: > * Recover virtual_ip (Started ava) > * Restart p_drbd_r0:0 (Master ava) > > Which in real life will obviously fail, > because we cannot "restart" (demote) a DRBD > while it is still in use (mounted, in this case). > > Only if I add a stupid intra-resource order constraint that explicitly > states to first start, then promote on the DRBD itself, > I get the result I would have expected: > > Transition Summary: > * Recover virtual_ip (Started ava) > * Restart p_drbd_r0:0 (Master ava) > * Restart p_fs_drbd1 (Started ava) > > Interestingly enough, if I simulate a monitor failure on "DRBD" directly, > it is in both cases the expected: > > Transition Summary: > * Recover p_drbd_r0:0 (Master ava) > * Restart p_fs_drbd1 (Started ava) > > > What am I missing? > > Do we have to "annotate" somewhere that you must not demote something > if it is still "in use" by something else? > > Did I just screw up the constraints somehow? > How would the constraints need to look like to get the expected result, > without explicitly adding the first-start-then-promote constraint? > > Is (was?) this a pengine bug? > > > > How to reproduce: > ================= > > crm shell style dummy config: > ------------------------------ > node 1: ava > node 2: emma > primitive p_drbd_r0 ocf:pacemaker:Stateful \ > op monitor interval=29s role=Master \ > op monitor interval=31s role=Slave > primitive p_fs_drbd1 ocf:pacemaker:Dummy \ > op monitor interval=20 timeout=40 > primitive virtual_ip ocf:pacemaker:Dummy \ > op monitor interval=30s > ms ms_drbd_r0 p_drbd_r0 \ > meta master-max=1 master-node-max=1 clone-max=1 clone-node-max=1 > colocation c1 inf: ms_drbd_r0 virtual_ip > colocation c2 inf: p_fs_drbd1:Started ms_drbd_r0:Master > order o1 inf: virtual_ip:start ms_drbd_r0:start > order o2 inf: ms_drbd_r0:promote p_fs_drbd1:start > ------------------------------ > > crm_simulate -x bad.xml -i virtual_ip_monitor_30000@ava=1 > > trying to demote DRBD before umount :-(( > > adding stupid constraint: > > order first-start-then-promote inf: ms_drbd_r0:start ms_drbd_r0:promote > > crm_simulate -x good.xml -i virtual_ip_monitor_30000@ava=1 > > yay, first umount, then demote... > > (tested with 1.1.15 and 1.1.16, not yet with more recent code base) >
Same with pacemaker 1.1.18. It really looks like there is no implicit ordering between "start" and "promote" even though we can only promote clone instance after it has been started. > > Full good.xml and bad.xml are both attached. > > Manipulating constraint in live cib using cibadmin only: > add: cibadmin -C -o constraints -X '<rsc_order id="first-start-then-promote" > score="INFINITY" first="ms_drbd_r0" first-action="start" then="ms_drbd_r0" > then-action="promote"/>' > del: cibadmin -D -X '<rsc_order id="first-start-then-promote"/>' > > Thanks, > > Lars > > > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org