Hi list, I was explaining how to use crm_simulate to a colleague when he pointed to me a non expected and buggy output.
Here are some simple steps to reproduce: $ pcs cluster setup --name usecase srv1 srv2 srv3 $ pcs cluster start --all $ pcs property set stonith-enabled=false $ pcs resource create dummy1 ocf:heartbeat:Dummy \ state=/tmp/dummy1.state \ op monitor interval=10s \ meta migration-threshold=3 resource-stickiness=1 Now, we are injecting 2 monitor soft errors, triggering 2 local recovery (stop/start): $ crm_simulate -S -L -i dummy1_monitor_10@srv1=1 -O /tmp/step1.xml $ crm_simulate -S -x /tmp/step1.xml -i dummy1_monitor_10@srv1=1 -O /tmp/step2.xml So far so good. A third soft error on monitor push dummy1 out of srv1, this was expected. However, the final status of the cluster shows dummy1 as started on both srv1 and srv2! $ crm_simulate -S -x /tmp/step2.xml -i dummy1_monitor_10@srv1=1 -O /tmp/step3.xml Current cluster status: Online: [ srv1 srv2 srv3 ] dummy1 (ocf::heartbeat:Dummy): Started srv1 Performing requested modifications + Injecting dummy1_monitor_10@srv1=1 into the configuration + Injecting attribute fail-count-dummy1=value++ into /node_state '1' + Injecting attribute last-failure-dummy1=1516287891 into /node_state '1' Transition Summary: * Recover dummy1 ( srv1 -> srv2 ) Executing cluster transition: * Cluster action: clear_failcount for dummy1 on srv1 * Resource action: dummy1 stop on srv1 * Resource action: dummy1 cancel=10 on srv1 * Pseudo action: all_stopped * Resource action: dummy1 start on srv2 * Resource action: dummy1 monitor=10000 on srv2 Revised cluster status: Online: [ srv1 srv2 srv3 ] dummy1 (ocf::heartbeat:Dummy): Started[ srv1 srv2 ] I suppose this is a bug from crm_simulate? Why is it considering dummy1 is started on srv1 when the transition execution stopped it on srv1? Taking the step3.xml output of this weird result force the cluster to stop dummy1 everywhere and start it on srv2 only: $ crm_simulate -S -x /tmp/step3.xml Current cluster status: Online: [ srv1 srv2 srv3 ] dummy1 (ocf::heartbeat:Dummy): Started[ srv1 srv2 ] Transition Summary: * Move dummy1 ( srv1 -> srv2 ) Executing cluster transition: * Resource action: dummy1 stop on srv2 * Resource action: dummy1 stop on srv1 * Pseudo action: all_stopped * Resource action: dummy1 start on srv2 * Resource action: dummy1 monitor=10000 on srv2 Revised cluster status: Online: [ srv1 srv2 srv3 ] dummy1 (ocf::heartbeat:Dummy): Started srv2 Thoughts? _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org