This is my messages log. Jul 27 08:02:46 vmx-occ-005 apache(WebSite)[32477]: INFO: apache not running Jul 27 08:02:46 vmx-occ-005 crmd[31424]: notice: process_lrm_event: Operation WebSite_monitor_60000: not running (node=node1, call=11, rc=7, cib-update=15, confirmed=false) Jul 27 08:02:46 vmx-occ-005 attrd[31422]: notice: attrd_cs_dispatch: Update relayed from node2 Jul 27 08:02:46 vmx-occ-005 attrd[31422]: notice: attrd_trigger_update: Sending flush op to all hosts for: fail-count-WebSite (1) Jul 27 08:02:46 vmx-occ-005 attrd[31422]: notice: attrd_perform_update: Sent update 12: fail-count-WebSite=1 Jul 27 08:02:46 vmx-occ-005 attrd[31422]: notice: attrd_cs_dispatch: Update relayed from node2 Jul 27 08:02:46 vmx-occ-005 attrd[31422]: notice: attrd_trigger_update: Sending flush op to all hosts for: last-failure-WebSite (1437976962) Jul 27 08:02:46 vmx-occ-005 attrd[31422]: notice: attrd_perform_update: Sent update 14: last-failure-WebSite=1437976962 Jul 27 08:02:46 vmx-occ-005 apache(WebSite)[32511]: INFO: apache is not running. Jul 27 08:02:46 vmx-occ-005 crmd[31424]: notice: process_lrm_event: Operation WebSite_stop_0: ok (node=node1, call=14, rc=0, cib-update=16, confirmed=true)
this is my corosync log: Jul 27 08:02:46 [31419] vmx-occ-005 cib: info: cib_process_request: Forwarding cib_modify operation for section status to master (origin=local/crmd/15) Jul 27 08:02:46 [31419] vmx-occ-005 cib: info: cib_perform_op: Diff: --- 0.38.65 2 Jul 27 08:02:46 [31419] vmx-occ-005 cib: info: cib_perform_op: Diff: +++ 0.38.66 (null) Jul 27 08:02:46 [31419] vmx-occ-005 cib: info: cib_perform_op: + /cib: @num_updates=66 Jul 27 08:02:46 [31419] vmx-occ-005 cib: info: cib_perform_op: + /cib/status/node_state[@id='node1']/lrm[@id='node1']/lrm_resources/lrm_resource[@id='WebSite']/lrm_rsc_op[@id='WebSite_last_failure_0']: @operation_key=WebSite_monitor_60000, @transition-key=9:119038:0:a5b747ee-4fbc-4f65-a690-29276791fd19, @transition-magic=0:7;9:119038:0:a5b747ee-4fbc-4f65-a690-29276791fd19, @call-id=11, @rc-code=7, @interval=60000, @last-rc-change=1437976966, @exec-time=0, @op-digest=eddc33bef3f1592ad847638ee4 Jul 27 08:02:46 [31419] vmx-occ-005 cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=node1/crmd/15, version=0.38.66) Jul 27 08:02:46 [31422] vmx-occ-005 attrd: notice: attrd_cs_dispatch: Update relayed from node2 Jul 27 08:02:46 [31422] vmx-occ-005 attrd: notice: attrd_trigger_update: Sending flush op to all hosts for: fail-count-WebSite (1) Jul 27 08:02:46 [31422] vmx-occ-005 attrd: notice: attrd_perform_update: Sent update 12: fail-count-WebSite=1 Jul 27 08:02:46 [31419] vmx-occ-005 cib: info: cib_process_request: Forwarding cib_modify operation for section status to master (origin=local/attrd/12) Jul 27 08:02:46 [31422] vmx-occ-005 attrd: notice: attrd_cs_dispatch: Update relayed from node2 Jul 27 08:02:46 [31422] vmx-occ-005 attrd: notice: attrd_trigger_update: Sending flush op to all hosts for: last-failure-WebSite (1437976962) Jul 27 08:02:46 [31419] vmx-occ-005 cib: info: cib_perform_op: Diff: --- 0.38.66 2 Jul 27 08:02:46 [31419] vmx-occ-005 cib: info: cib_perform_op: Diff: +++ 0.38.67 (null) Jul 27 08:02:46 [31419] vmx-occ-005 cib: info: cib_perform_op: + /cib: @num_updates=67 Jul 27 08:02:46 [31419] vmx-occ-005 cib: info: cib_perform_op: ++ /cib/status/node_state[@id='node1']/transient_attributes[@id='node1']/instance_attributes[@id='status-node1']: <nvpair id="status-node1-fail-count-WebSite" name="fail-count-WebSite" value="1"/> Jul 27 08:02:46 [31419] vmx-occ-005 cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=node1/attrd/12, version=0.38.67) Jul 27 08:02:46 [31422] vmx-occ-005 attrd: notice: attrd_perform_update: Sent update 14: last-failure-WebSite=1437976962 Jul 27 08:02:46 [31419] vmx-occ-005 cib: info: cib_process_request: Forwarding cib_modify operation for section status to master (origin=local/attrd/14) Jul 27 08:02:46 [31419] vmx-occ-005 cib: info: cib_perform_op: Diff: --- 0.38.67 2 Jul 27 08:02:46 [31419] vmx-occ-005 cib: info: cib_perform_op: Diff: +++ 0.38.68 (null) Jul 27 08:02:46 [31419] vmx-occ-005 cib: info: cib_perform_op: + /cib: @num_updates=68 Jul 27 08:02:46 [31419] vmx-occ-005 cib: info: cib_perform_op: ++ /cib/status/node_state[@id='node1']/transient_attributes[@id='node1']/instance_attributes[@id='status-node1']: <nvpair id="status-node1-last-failure-WebSite" name="last-failure-WebSite" value="1437976962"/> Jul 27 08:02:46 [31419] vmx-occ-005 cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=node1/attrd/14, version=0.38.68) Jul 27 08:02:46 [31419] vmx-occ-005 cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=node2/attrd/404, version=0.38.68) Jul 27 08:02:46 [31421] vmx-occ-005 lrmd: info: cancel_recurring_action: Cancelling operation WebSite_monitor_60000 Jul 27 08:02:46 [31424] vmx-occ-005 crmd: info: do_lrm_rsc_op: Performing key=3:119728:0:a5b747ee-4fbc-4f65-a690-29276791fd19 op=WebSite_stop_0 Jul 27 08:02:46 [31421] vmx-occ-005 lrmd: info: log_execute: executing - rsc:WebSite action:stop call_id:14 Jul 27 08:02:46 [31424] vmx-occ-005 crmd: info: process_lrm_event: Operation WebSite_monitor_60000: Cancelled (node=node1, call=11, confirmed=true) apache(WebSite)[32511]: 2015/07/27_08:02:46 INFO: apache is not running. Jul 27 08:02:46 [31421] vmx-occ-005 lrmd: info: log_finished: finished - rsc:WebSite action:stop call_id:14 pid:32511 exit-code:0 exec-time:167ms queue-time:0ms Jul 27 08:02:46 [31424] vmx-occ-005 crmd: notice: process_lrm_event: Operation WebSite_stop_0: ok (node=node1, call=14, rc=0, cib-update=16, confirmed=true) Jul 27 08:02:46 [31419] vmx-occ-005 cib: info: cib_process_request: Forwarding cib_modify operation for section status to master (origin=local/crmd/16) Jul 27 08:02:46 [31419] vmx-occ-005 cib: info: cib_perform_op: Diff: --- 0.38.68 2 Jul 27 08:02:46 [31419] vmx-occ-005 cib: info: cib_perform_op: Diff: +++ 0.38.69 (null) Jul 27 08:02:46 [31419] vmx-occ-005 cib: info: cib_perform_op: + /cib: @num_updates=69 Jul 27 08:02:46 [31419] vmx-occ-005 cib: info: cib_perform_op: + /cib/status/node_state[@id='node1']/lrm[@id='node1']/lrm_resources/lrm_resource[@id='WebSite']/lrm_rsc_op[@id='WebSite_last_0']: @operation_key=WebSite_stop_0, @operation=stop, @transition-key=3:119728:0:a5b747ee-4fbc-4f65-a690-29276791fd19, @transition-magic=0:0;3:119728:0:a5b747ee-4fbc-4f65-a690-29276791fd19, @call-id=14, @last-run=1437976966, @last-rc-change=1437976966, @exec-time=167 Jul 27 08:02:46 [31419] vmx-occ-005 cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=node1/crmd/16, version=0.38.69) Jul 27 08:02:51 [31419] vmx-occ-005 cib: info: cib_process_ping: Reporting our current digest to node2: 608e7e54d63c1f66c39c9b4162a189d3 for 0.38.69 (0x846320 0) These are the logs after i have triggered the failure. Pacemaker doesnt restarts the service automatically, even if i start the httpd service , the status i get is stopped on node 1. If i restart the cluster it works fine. On Mon, Jul 27, 2015 at 11:30 AM, Vijay Partha <vijaysarath...@gmail.com> wrote: > Could you help me out in configuring stonith properly. I am new to > pacemaker and I have been working for a few days. What all logs do you > require? > > On Mon, Jul 27, 2015 at 11:22 AM, Digimer <li...@alteeve.ca> wrote: > >> On 27/07/15 01:35 AM, Vijay Partha wrote: >> > HI . >> > >> > My configuration file looks like this: >> > >> > <cib crm_feature_set="3.0.9" validate-with="pacemaker-2.0" epoch="38" >> > num_updates="0" admin_epoch="0" cib-last-written="Fri Jul 24 15:57:06 >> > 2015" have-quorum="1" dc-uuid="node2"> >> > <configuration> >> > <crm_config> >> > <cluster_property_set id="cib-bootstrap-options"> >> > <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" >> > value="1.1.11-97629de"/> >> > <nvpair id="cib-bootstrap-options-cluster-infrastructure" >> > name="cluster-infrastructure" value="cman"/> >> > <nvpair id="cib-bootstrap-options-stonith-enabled" >> > name="stonith-enabled" value="false"/> >> > <nvpair id="cib-bootstrap-options-no-quorum-policy" >> > name="no-quorum-policy" value="ignore"/> >> > <nvpair id="cib-bootstrap-options-cluster-recheck-interval" >> > name="cluster-recheck-interval" value="2s"/> >> > </cluster_property_set> >> > </crm_config> >> > <nodes> >> > <node id="node1" uname="node1"/> >> > <node id="node2" uname="node2"/> >> > </nodes> >> > <resources> >> > <primitive class="ocf" id="my_first_svc" provider="heartbeat" >> > type="Dummy"> >> > <instance_attributes id="my_first_svc-instance_attributes"/> >> > <operations> >> > <op id="my_first_svc-start-timeout-20" interval="0s" >> > name="start" timeout="20"/> >> > <op id="my_first_svc-stop-timeout-20" interval="0s" >> > name="stop" timeout="20"/> >> > <op id="my_first_svc-monitor-interval-120s" interval="120s" >> > name="monitor"/> >> > </operations> >> > </primitive> >> > <primitive class="ocf" id="WebSite" provider="heartbeat" >> > type="apache"> >> > <instance_attributes id="WebSite-instance_attributes"> >> > <nvpair id="WebSite-instance_attributes-configfile" >> > name="configfile" value="/etc/httpd/conf/httpd.conf"/> >> > <nvpair id="WebSite-instance_attributes-statusurl" >> > name="statusurl" value="http://localhost/server-status"/> >> >> > </instance_attributes> >> > <operations> >> > <op id="WebSite-start-timeout-40s" interval="0s" name="start" >> > timeout="40s" on-fail="restart"/> >> > <op id="WebSite-stop-timeout-60s" interval="0s" name="stop" >> > timeout="60s" on-fail="restart"/> >> > <op id="WebSite-monitor-interval-1min" interval="1min" >> > name="monitor" on-fail="restart"/> >> > </operations> >> > <meta_attributes id="WebSite-meta_attributes"/> >> > </primitive> >> > </resources> >> > <constraints> >> > <rsc_location id="location-WebSite-node2-50" node="node2" >> > rsc="WebSite" score="50"/> >> > </constraints> >> > <rsc_defaults> >> > <meta_attributes id="rsc_defaults-options"> >> > <nvpair id="rsc_defaults-options-migration-threshold" >> > name="migration-threshold" value="1"/> >> > </meta_attributes> >> > </rsc_defaults> >> > <op_defaults> >> > <meta_attributes id="op_defaults-options"> >> > <nvpair id="op_defaults-options-timeout" name="timeout" >> > value="240s"/> >> > </meta_attributes> >> > </op_defaults> >> > </configuration> >> > </cib> >> > >> > Once i stop the httpd service the pacemaker does not restarts it >> > automatically. >> >> As mentioned, logs help a lot. The logs from all nodes starting before >> you trigger the failure until after the logs stop printing please. >> >> Also, you must use stonith. Please configure and test it. Often problems >> go away when stonith is configured and working properly. >> >> -- >> Digimer >> Papers and Projects: https://alteeve.ca/w/ >> What if the cure for cancer is trapped in the mind of a person without >> access to education? >> >> _______________________________________________ >> Users mailing list: Users@clusterlabs.org >> http://clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > > > > -- > With Regards > P.Vijay > -- With Regards P.Vijay
_______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org