On 05/11/2017 03:00 PM, Ludovic Vaugeois-Pepin wrote: > Hi > I translated the a Postgresql multi state RA > (https://github.com/dalibo/PAF) in Python > (https://github.com/ulodciv/deploy_cluster), and I have been editing it > heavily. > > In parallel I am writing unit tests and functional tests. > > I am having an issue with a functional test that abruptly powers off a > slave named says "host3" (hot standby PG instance). Later on I start the > slave back. Once it is started, I run "pcs cluster start host3". And > this is where I start having a problem. > > I check every second the output of "pcs status xml" until host3 is said > to be ready as a slave again. In the following I assume that test3 is > ready as a slave: > > <nodes> > <node name="test1" id="1" online="true" standby="false" > standby_onfail="false" maintenance="false" pending="false" > unclean="false" shutdown="false" expected_up="true" is_dc="false" > resources_running="2" type="member" /> > <node name="test2" id="2" online="true" standby="false" > standby_onfail="false" maintenance="false" pending="false" > unclean="false" shutdown="false" expected_up="true" is_dc="true" > resources_running="1" type="member" /> > <node name="test3" id="3" online="true" standby="false" > standby_onfail="false" maintenance="false" pending="false" > unclean="false" shutdown="false" expected_up="true" is_dc="false" > resources_running="1" type="member" /> > </nodes>
The <nodes> section says nothing about the current state of the nodes. Look at the <node_state> entries for that. in_ccm means the cluster stack level, and crmd means the pacemaker level -- both need to be up. > <resources> > <clone id="pgsql-ha" multi_state="true" unique="false" > managed="true" failed="false" failure_ignored="false" > > <resource id="pgsqld" resource_agent="ocf::heartbeat:pgha" > role="Slave" active="true" orphaned="false" managed="true" > failed="false" failure_ignored="false" nodes_running_on="1" > > <node name="test3" id="3" cached="false"/> > </resource> > <resource id="pgsqld" resource_agent="ocf::heartbeat:pgha" > role="Master" active="true" orphaned="false" managed="true" > failed="false" failure_ignored="false" nodes_running_on="1" > > <node name="test1" id="1" cached="false"/> > </resource> > <resource id="pgsqld" resource_agent="ocf::heartbeat:pgha" > role="Slave" active="true" orphaned="false" managed="true" > failed="false" failure_ignored="false" nodes_running_on="1" > > <node name="test2" id="2" cached="false"/> > </resource> > </clone> > By ready to go I mean that upon running "pcs cluster start test3", the > following occurs before test3 appears ready in the XML: > > pcs cluster start test3 > monitor-> RA returns unknown error (1) > notify/pre-stop -> RA returns ok (0) > stop -> RA returns ok (0) > start-> RA returns ok (0) > > The problem I have is that between "pcs cluster start test3" and > "monitor", it seems that the XML returned by "pcs status xml" says test3 > is ready (the XML extract above is what I get at that moment). Once > "monitor" occurs, the returned XML shows test3 to be offline, and not > until the start is finished do I once again have test3 shown as ready. > > I am getting anything wrong? Is there a simpler or better way to check > if test3 is fully functional again, ie OCF start was successful? > > Thanks > > Ludovic _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org