Dear Ken, Thanks for the reply! I lowered migration-threshold to 1 and rearranged contraints like you suggested:
Location Constraints: Ordering Constraints: promote mail-clone then start fs-services (kind:Mandatory) promote spool-clone then start fs-services (kind:Mandatory) start fs-services then start network-services (kind:Mandatory) start network-services then start mail-services (kind:Mandatory) Colocation Constraints: fs-services with spool-clone (score:INFINITY) (rsc-role:Started) (with-rsc-role:Master) fs-services with mail-clone (score:INFINITY) (rsc-role:Started) (with-rsc-role:Master) network-services with mail-services (score:INFINITY) mail-services with fs-services (score:INFINITY) Now virtualip and postfix becomes stopped, I guess these are relevant but I attach also full logs: Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: native_color: Resource postfix cannot run anywhere Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: native_color: Resource virtualip-1 cannot run anywhere Interesting, will try to play around with ordering - colocation, the solution must be in these settings... Best regards, Lorand Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info: cib_perform_op: Diff: --- 0.215.7 2 Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info: cib_perform_op: Diff: +++ 0.215.8 (null) Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info: cib_perform_op: + /cib: @num_updates=8 Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info: cib_perform_op: ++ /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='postfix']: <lrm_rsc_op id="postfix_last_failure_0" operation_key="postfix_monitor_45000" operation="monitor" crm-debug-origin="do_update_resource" crm_feature_set="3.0.10" transition-key="86:2962:0:ae755a85-c250-498f-9c94-ddd8a7e2788a" transition-magic="0:7;86:2962:0:ae755a85-c250-498f-9c94-ddd8a7e2788a" on_node="mail1" call-id="1333" rc-code="7" Mar 16 11:38:06 [7420] HWJ-626.domain.local crmd: info: abort_transition_graph: Transition aborted by postfix_monitor_45000 'create' on mail1: Inactive graph (magic=0:7;86:2962:0:ae755a85-c250-498f-9c94-ddd8a7e2788a, cib=0.215.8, source=process_graph_event:598, 1) Mar 16 11:38:06 [7420] HWJ-626.domain.local crmd: info: update_failcount: Updating failcount for postfix on mail1 after failed monitor: rc=7 (update=value++, time=1458124686) Mar 16 11:38:06 [7420] HWJ-626.domain.local crmd: info: process_graph_event: Detected action (2962.86) postfix_monitor_45000.1333=not running: failed Mar 16 11:38:06 [7418] HWJ-626.domain.local attrd: info: attrd_client_update: Expanded fail-count-postfix=value++ to 1 Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=mail1/crmd/253, version=0.215.8) Mar 16 11:38:06 [7418] HWJ-626.domain.local attrd: info: attrd_peer_update: Setting fail-count-postfix[mail1]: (null) -> 1 from mail2 Mar 16 11:38:06 [7420] HWJ-626.domain.local crmd: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ] Mar 16 11:38:06 [7418] HWJ-626.domain.local attrd: info: write_attribute: Sent update 406 with 2 changes for fail-count-postfix, id=<n/a>, set=(null) Mar 16 11:38:06 [7418] HWJ-626.domain.local attrd: info: attrd_peer_update: Setting last-failure-postfix[mail1]: 1458124291 -> 1458124686 from mail2 Mar 16 11:38:06 [7418] HWJ-626.domain.local attrd: info: write_attribute: Sent update 407 with 2 changes for last-failure-postfix, id=<n/a>, set=(null) Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info: cib_process_request: Forwarding cib_modify operation for section status to master (origin=local/attrd/406) Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info: cib_process_request: Forwarding cib_modify operation for section status to master (origin=local/attrd/407) Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info: cib_perform_op: Diff: --- 0.215.8 2 Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info: cib_perform_op: Diff: +++ 0.215.9 (null) Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info: cib_perform_op: + /cib: @num_updates=9 Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info: cib_perform_op: ++ /cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']: <nvpair id="status-1-fail-count-postfix" name="fail-count-postfix" value="1"/> Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=mail2/attrd/406, version=0.215.9) Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info: cib_perform_op: Diff: --- 0.215.9 2 Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info: cib_perform_op: Diff: +++ 0.215.10 (null) Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info: cib_perform_op: + /cib: @num_updates=10 Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info: cib_perform_op: + /cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']/nvpair[@id='status-1-last-failure-postfix']: @value=1458124686 Mar 16 11:38:06 [7418] HWJ-626.domain.local attrd: info: attrd_cib_callback: Update 406 for fail-count-postfix: OK (0) Mar 16 11:38:06 [7418] HWJ-626.domain.local attrd: info: attrd_cib_callback: Update 406 for fail-count-postfix[mail1]=1: OK (0) Mar 16 11:38:06 [7415] HWJ-626.domain.local cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=mail2/attrd/407, version=0.215.10) Mar 16 11:38:06 [7418] HWJ-626.domain.local attrd: info: attrd_cib_callback: Update 406 for fail-count-postfix[mail2]=(null): OK (0) Mar 16 11:38:06 [7418] HWJ-626.domain.local attrd: info: attrd_cib_callback: Update 407 for last-failure-postfix: OK (0) Mar 16 11:38:06 [7418] HWJ-626.domain.local attrd: info: attrd_cib_callback: Update 407 for last-failure-postfix[mail1]=1458124686: OK (0) Mar 16 11:38:06 [7418] HWJ-626.domain.local attrd: info: attrd_cib_callback: Update 407 for last-failure-postfix[mail2]=1457610376: OK (0) Mar 16 11:38:06 [7420] HWJ-626.domain.local crmd: info: abort_transition_graph: Transition aborted by status-1-fail-count-postfix, fail-count-postfix=1: Transient attribute change (create cib=0.215.9, source=abort_unless_down:319, path=/cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1'], 1) Mar 16 11:38:06 [7420] HWJ-626.domain.local crmd: info: abort_transition_graph: Transition aborted by status-1-last-failure-postfix, last-failure-postfix=1458124686: Transient attribute change (modify cib=0.215.10, source=abort_unless_down:319, path=/cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']/nvpair[@id='status-1-last-failure-postfix'], 1) Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: notice: unpack_config: On loss of CCM Quorum: Ignore Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: determine_online_status: Node mail1 is online Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: determine_online_status: Node mail2 is online Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: determine_op_status: Operation monitor found resource mail:0 active in master mode on mail1 Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: determine_op_status: Operation monitor found resource spool:0 active in master mode on mail1 Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: determine_op_status: Operation monitor found resource fs-spool active on mail1 Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: determine_op_status: Operation monitor found resource fs-spool active on mail1 Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: determine_op_status: Operation monitor found resource fs-mail active on mail1 Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: determine_op_status: Operation monitor found resource fs-mail active on mail1 Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: warning: unpack_rsc_op_failure: Processing failed op monitor for postfix on mail1: not running (7) Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: determine_op_status: Operation monitor found resource spool:1 active in master mode on mail2 Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: determine_op_status: Operation monitor found resource mail:1 active in master mode on mail2 Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: group_print: Resource Group: network-services Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: native_print: virtualip-1 (ocf::heartbeat:IPaddr2): Started mail1 Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: clone_print: Master/Slave Set: spool-clone [spool] Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: short_print: Masters: [ mail1 ] Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: short_print: Slaves: [ mail2 ] Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: clone_print: Master/Slave Set: mail-clone [mail] Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: short_print: Masters: [ mail1 ] Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: short_print: Slaves: [ mail2 ] Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: group_print: Resource Group: fs-services Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: native_print: fs-spool (ocf::heartbeat:Filesystem): Started mail1 Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: native_print: fs-mail (ocf::heartbeat:Filesystem): Started mail1 Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: group_print: Resource Group: mail-services Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: native_print: postfix (ocf::heartbeat:postfix): FAILED mail1 Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: master_color: Promoting mail:0 (Master mail1) Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: master_color: mail-clone: Promoted 1 instances of a possible 1 to master Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: master_color: Promoting spool:0 (Master mail1) Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: master_color: spool-clone: Promoted 1 instances of a possible 1 to master Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: RecurringOp: Start recurring monitor (45s) for postfix on mail1 Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: LogActions: Leave virtualip-1 (Started mail1) Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: LogActions: Leave spool:0 (Master mail1) Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: LogActions: Leave spool:1 (Slave mail2) Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: LogActions: Leave mail:0 (Master mail1) Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: LogActions: Leave mail:1 (Slave mail2) Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: LogActions: Leave fs-spool (Started mail1) Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: LogActions: Leave fs-mail (Started mail1) Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: notice: LogActions: Recover postfix (Started mail1) Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: notice: process_pe_message: Calculated Transition 2963: /var/lib/pacemaker/pengine/pe-input-330.bz2 Mar 16 11:38:06 [7420] HWJ-626.domain.local crmd: info: handle_response: pe_calc calculation pe_calc-dc-1458124686-5541 is obsolete Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: notice: unpack_config: On loss of CCM Quorum: Ignore Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: determine_online_status: Node mail1 is online Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: determine_online_status: Node mail2 is online Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: determine_op_status: Operation monitor found resource mail:0 active in master mode on mail1 Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: determine_op_status: Operation monitor found resource spool:0 active in master mode on mail1 Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: determine_op_status: Operation monitor found resource fs-spool active on mail1 Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: determine_op_status: Operation monitor found resource fs-spool active on mail1 Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: determine_op_status: Operation monitor found resource fs-mail active on mail1 Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: determine_op_status: Operation monitor found resource fs-mail active on mail1 Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: warning: unpack_rsc_op_failure: Processing failed op monitor for postfix on mail1: not running (7) Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: determine_op_status: Operation monitor found resource spool:1 active in master mode on mail2 Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: determine_op_status: Operation monitor found resource mail:1 active in master mode on mail2 Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: group_print: Resource Group: network-services Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: native_print: virtualip-1 (ocf::heartbeat:IPaddr2): Started mail1 Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: clone_print: Master/Slave Set: spool-clone [spool] Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: short_print: Masters: [ mail1 ] Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: short_print: Slaves: [ mail2 ] Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: clone_print: Master/Slave Set: mail-clone [mail] Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: short_print: Masters: [ mail1 ] Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: short_print: Slaves: [ mail2 ] Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: group_print: Resource Group: fs-services Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: native_print: fs-spool (ocf::heartbeat:Filesystem): Started mail1 Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: native_print: fs-mail (ocf::heartbeat:Filesystem): Started mail1 Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: group_print: Resource Group: mail-services Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: native_print: postfix (ocf::heartbeat:postfix): FAILED mail1 Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: get_failcount_full: postfix has failed 1 times on mail1 Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: warning: common_apply_stickiness: Forcing postfix away from mail1 after 1 failures (max=1) Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: master_color: Promoting mail:0 (Master mail1) Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: master_color: mail-clone: Promoted 1 instances of a possible 1 to master Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: master_color: Promoting spool:0 (Master mail1) Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: master_color: spool-clone: Promoted 1 instances of a possible 1 to master Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: rsc_merge_weights: fs-mail: Rolling back scores from postfix Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: rsc_merge_weights: postfix: Rolling back scores from virtualip-1 Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: native_color: Resource postfix cannot run anywhere Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: native_color: Resource virtualip-1 cannot run anywhere Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: notice: LogActions: Stop virtualip-1 (mail1) Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: LogActions: Leave spool:0 (Master mail1) Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: LogActions: Leave spool:1 (Slave mail2) Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: LogActions: Leave mail:0 (Master mail1) Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: LogActions: Leave mail:1 (Slave mail2) Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: LogActions: Leave fs-spool (Started mail1) Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: info: LogActions: Leave fs-mail (Started mail1) Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: notice: LogActions: Stop postfix (mail1) Mar 16 11:38:06 [7420] HWJ-626.domain.local crmd: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ] Mar 16 11:38:06 [7419] HWJ-626.domain.local pengine: notice: process_pe_message: Calculated Transition 2964: /var/lib/pacemaker/pengine/pe-input-331.bz2 Mar 16 11:38:06 [7420] HWJ-626.domain.local crmd: info: do_te_invoke: Processing graph 2964 (ref=pe_calc-dc-1458124686-5542) derived from /var/lib/pacemaker/pengine/pe-input-331.bz2 Mar 16 11:38:06 [7420] HWJ-626.domain.local crmd: notice: te_rsc_command: Initiating action 5: stop postfix_stop_0 on mail1 Mar 16 11:38:07 [7415] HWJ-626.domain.local cib: info: cib_perform_op: Diff: --- 0.215.10 2 Mar 16 11:38:07 [7415] HWJ-626.domain.local cib: info: cib_perform_op: Diff: +++ 0.215.11 (null) Mar 16 11:38:07 [7415] HWJ-626.domain.local cib: info: cib_perform_op: + /cib: @num_updates=11 Mar 16 11:38:07 [7415] HWJ-626.domain.local cib: info: cib_perform_op: + /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='postfix']/lrm_rsc_op[@id='postfix_last_0']: @operation_key=postfix_stop_0, @operation=stop, @transition-key=5:2964:0:ae755a85-c250-498f-9c94-ddd8a7e2788a, @transition-magic=0:0;5:2964:0:ae755a85-c250-498f-9c94-ddd8a7e2788a, @call-id=1335, @last-run=1458124686, @last-rc-change=1458124686, @exec-time=435 Mar 16 11:38:07 [7420] HWJ-626.domain.local crmd: info: match_graph_event: Action postfix_stop_0 (5) confirmed on mail1 (rc=0) Mar 16 11:38:07 [7415] HWJ-626.domain.local cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=mail1/crmd/254, version=0.215.11) Mar 16 11:38:07 [7420] HWJ-626.domain.local crmd: notice: te_rsc_command: Initiating action 12: stop virtualip-1_stop_0 on mail1 Mar 16 11:38:07 [7415] HWJ-626.domain.local cib: info: cib_perform_op: Diff: --- 0.215.11 2 Mar 16 11:38:07 [7415] HWJ-626.domain.local cib: info: cib_perform_op: Diff: +++ 0.215.12 (null) Mar 16 11:38:07 [7415] HWJ-626.domain.local cib: info: cib_perform_op: + /cib: @num_updates=12 Mar 16 11:38:07 [7415] HWJ-626.domain.local cib: info: cib_perform_op: + /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='virtualip-1']/lrm_rsc_op[@id='virtualip-1_last_0']: @operation_key=virtualip-1_stop_0, @operation=stop, @transition-key=12:2964:0:ae755a85-c250-498f-9c94-ddd8a7e2788a, @transition-magic=0:0;12:2964:0:ae755a85-c250-498f-9c94-ddd8a7e2788a, @call-id=1337, @last-run=1458124687, @last-rc-change=1458124687, @exec-time=56 Mar 16 11:38:07 [7420] HWJ-626.domain.local crmd: info: match_graph_event: Action virtualip-1_stop_0 (12) confirmed on mail1 (rc=0) Mar 16 11:38:07 [7415] HWJ-626.domain.local cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=mail1/crmd/255, version=0.215.12) Mar 16 11:38:07 [7420] HWJ-626.domain.local crmd: notice: run_graph: Transition 2964 (Complete=7, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-331.bz2): Complete Mar 16 11:38:07 [7420] HWJ-626.domain.local crmd: info: do_log: FSA: Input I_TE_SUCCESS from notify_crmd() received in state S_TRANSITION_ENGINE Mar 16 11:38:07 [7420] HWJ-626.domain.local crmd: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] Mar 16 11:38:12 [7415] HWJ-626.domain.local cib: info: cib_process_ping: Reporting our current digest to mail2: ed43bc3ecf0f15853900ba49fc514870 for 0.215.12 (0x152b110 0) On Mon, Mar 14, 2016 at 6:44 PM, Ken Gaillot <kgail...@redhat.com> wrote: > On 03/10/2016 09:49 AM, Lorand Kelemen wrote: > > Dear List, > > > > After the creation and testing of a simple 2 node active-passive > > drbd+postfix cluster nearly everything works flawlessly (standby, failure > > of a filesystem resource + failover, splitbrain + manual recovery) > however > > when delibarately killing the postfix instance, after reaching the > > migration threshold failover does not occur and resources revert to the > > Stopped state (except the master-slave drbd resource, which works as > > expected). > > > > Ordering and colocation is configured, STONITH and quorum disabled, the > > goal is to always have one node running all the resources and at any sign > > of error it should fail over to the passive node, nothing fancy. > > > > Is my configuration wrong or am I hitting a bug? > > > > All software from centos 7 + elrepo repositories. > > With these versions, you can set "two_node: 1" in > /etc/corosync/corosync.conf (which will be done automatically if you > used "pcs cluster setup" initially), and then you don't need to ignore > quorum in pacemaker. > > > Regarding STONITH: the machines are running on free ESXi instances on > > separate machines, so the Vmware fencing agents won't work because in the > > free version the API is read-only. > > Still trying to figure out a way to go, until then manual recovery + huge > > arp cache times on the upstream firewall... > > > > Please find pe-input*.bz files attached, logs and config below. The > > situation: on node mail1 postfix was killed 3 times (migration > threshold), > > it should have failed over to mail2. > > When killing a filesystem resource three times this happens flawlessly. > > > > Thanks for your input! > > > > Best regards, > > Lorand > > > > > > Cluster Name: mailcluster > > Corosync Nodes: > > mail1 mail2 > > Pacemaker Nodes: > > mail1 mail2 > > > > Resources: > > Group: network-services > > Resource: virtualip-1 (class=ocf provider=heartbeat type=IPaddr2) > > Attributes: ip=10.20.64.10 cidr_netmask=24 nic=lan0 > > Operations: start interval=0s timeout=20s > (virtualip-1-start-interval-0s) > > stop interval=0s timeout=20s > (virtualip-1-stop-interval-0s) > > monitor interval=30s (virtualip-1-monitor-interval-30s) > > Master: spool-clone > > Meta Attrs: master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 > > notify=true > > Resource: spool (class=ocf provider=linbit type=drbd) > > Attributes: drbd_resource=spool > > Operations: start interval=0s timeout=240 (spool-start-interval-0s) > > promote interval=0s timeout=90 (spool-promote-interval-0s) > > demote interval=0s timeout=90 (spool-demote-interval-0s) > > stop interval=0s timeout=100 (spool-stop-interval-0s) > > monitor interval=10s (spool-monitor-interval-10s) > > Master: mail-clone > > Meta Attrs: master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 > > notify=true > > Resource: mail (class=ocf provider=linbit type=drbd) > > Attributes: drbd_resource=mail > > Operations: start interval=0s timeout=240 (mail-start-interval-0s) > > promote interval=0s timeout=90 (mail-promote-interval-0s) > > demote interval=0s timeout=90 (mail-demote-interval-0s) > > stop interval=0s timeout=100 (mail-stop-interval-0s) > > monitor interval=10s (mail-monitor-interval-10s) > > Group: fs-services > > Resource: fs-spool (class=ocf provider=heartbeat type=Filesystem) > > Attributes: device=/dev/drbd0 directory=/var/spool/postfix fstype=ext4 > > options=nodev,nosuid,noexec > > Operations: start interval=0s timeout=60 (fs-spool-start-interval-0s) > > stop interval=0s timeout=60 (fs-spool-stop-interval-0s) > > monitor interval=20 timeout=40 > (fs-spool-monitor-interval-20) > > Resource: fs-mail (class=ocf provider=heartbeat type=Filesystem) > > Attributes: device=/dev/drbd1 directory=/var/spool/mail fstype=ext4 > > options=nodev,nosuid,noexec > > Operations: start interval=0s timeout=60 (fs-mail-start-interval-0s) > > stop interval=0s timeout=60 (fs-mail-stop-interval-0s) > > monitor interval=20 timeout=40 > (fs-mail-monitor-interval-20) > > Group: mail-services > > Resource: postfix (class=ocf provider=heartbeat type=postfix) > > Operations: start interval=0s timeout=20s (postfix-start-interval-0s) > > stop interval=0s timeout=20s (postfix-stop-interval-0s) > > monitor interval=45s (postfix-monitor-interval-45s) > > > > Stonith Devices: > > Fencing Levels: > > > > Location Constraints: > > Ordering Constraints: > > start network-services then promote mail-clone (kind:Mandatory) > > (id:order-network-services-mail-clone-mandatory) > > promote mail-clone then promote spool-clone (kind:Mandatory) > > (id:order-mail-clone-spool-clone-mandatory) > > promote spool-clone then start fs-services (kind:Mandatory) > > (id:order-spool-clone-fs-services-mandatory) > > start fs-services then start mail-services (kind:Mandatory) > > (id:order-fs-services-mail-services-mandatory) > > Colocation Constraints: > > network-services with spool-clone (score:INFINITY) (rsc-role:Started) > > (with-rsc-role:Master) > (id:colocation-network-services-spool-clone-INFINITY) > > network-services with mail-clone (score:INFINITY) (rsc-role:Started) > > (with-rsc-role:Master) > (id:colocation-network-services-mail-clone-INFINITY) > > network-services with fs-services (score:INFINITY) > > (id:colocation-network-services-fs-services-INFINITY) > > network-services with mail-services (score:INFINITY) > > (id:colocation-network-services-mail-services-INFINITY) > > I'm not sure whether it's causing your issue, but I would make the > constraints reflect the logical relationships better. > > For example, network-services only needs to be colocated with > mail-services logically; it's mail-services that needs to be with > fs-services, and fs-services that needs to be with > spool-clone/mail-clone master. In other words, don't make the > highest-level resource depend on everything else, make each level depend > on the level below it. > > Also, I would guess that the virtual IP only needs to be ordered before > mail-services, and mail-clone and spool-clone could both be ordered > before fs-services, rather than ordering mail-clone before spool-clone. > > > Resources Defaults: > > migration-threshold: 3 > > Operations Defaults: > > on-fail: restart > > > > Cluster Properties: > > cluster-infrastructure: corosync > > cluster-name: mailcluster > > cluster-recheck-interval: 5min > > dc-version: 1.1.13-10.el7_2.2-44eb2dd > > default-resource-stickiness: infinity > > have-watchdog: false > > last-lrm-refresh: 1457613674 > > no-quorum-policy: ignore > > pe-error-series-max: 1024 > > pe-input-series-max: 1024 > > pe-warn-series-max: 1024 > > stonith-enabled: false > > > > > > > > > > > > Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info: > > cib_perform_op: Diff: --- 0.197.15 2 > > Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info: > > cib_perform_op: Diff: +++ 0.197.16 (null) > > Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info: > > cib_perform_op: + /cib: @num_updates=16 > > Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info: > > cib_perform_op: + > > > /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='postfix']/lrm_rsc_op[@id='postfix_last_failure_0']: > > @transition-key=4:1234:0:ae755a85-c250-498f-9c94-ddd8a7e2788a, > > @transition-magic=0:7;4:1234:0:ae755a85-c250-498f-9c94-ddd8a7e2788a, > > @call-id=1274, @last-rc-change=1457613440 > > Mar 10 13:37:20 [7420] HWJ-626.domain.local crmd: info: > > abort_transition_graph: Transition aborted by postfix_monitor_45000 > > 'modify' on mail1: Inactive graph > > (magic=0:7;4:1234:0:ae755a85-c250-498f-9c94-ddd8a7e2788a, cib=0.197.16, > > source=process_graph_event:598, 1) > > Mar 10 13:37:20 [7420] HWJ-626.domain.local crmd: info: > > update_failcount: Updating failcount for postfix on mail1 after > failed > > monitor: rc=7 (update=value++, time=1457613440) > > Mar 10 13:37:20 [7418] HWJ-626.domain.local attrd: info: > > attrd_client_update: Expanded fail-count-postfix=value++ to 3 > > Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info: > > cib_process_request: Completed cib_modify operation for section status: > OK > > (rc=0, origin=mail1/crmd/196, version=0.197.16) > > Mar 10 13:37:20 [7418] HWJ-626.domain.local attrd: info: > > attrd_peer_update: Setting fail-count-postfix[mail1]: 2 -> 3 from > mail2 > > Mar 10 13:37:20 [7418] HWJ-626.domain.local attrd: info: > > write_attribute: Sent update 400 with 2 changes for > > fail-count-postfix, id=<n/a>, set=(null) > > Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info: > > cib_process_request: Forwarding cib_modify operation for section status > to > > master (origin=local/attrd/400) > > Mar 10 13:37:20 [7420] HWJ-626.domain.local crmd: info: > > process_graph_event: Detected action (1234.4) > > postfix_monitor_45000.1274=not running: failed > > Mar 10 13:37:20 [7418] HWJ-626.domain.local attrd: info: > > attrd_peer_update: Setting last-failure-postfix[mail1]: 1457613347 -> > > 1457613440 from mail2 > > Mar 10 13:37:20 [7420] HWJ-626.domain.local crmd: notice: > > do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ > > input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ] > > Mar 10 13:37:20 [7418] HWJ-626.domain.local attrd: info: > > write_attribute: Sent update 401 with 2 changes for > > last-failure-postfix, id=<n/a>, set=(null) > > Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info: > > cib_perform_op: Diff: --- 0.197.16 2 > > Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info: > > cib_perform_op: Diff: +++ 0.197.17 (null) > > Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info: > > cib_perform_op: + /cib: @num_updates=17 > > Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info: > > cib_perform_op: + > > > /cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']/nvpair[@id='status-1-fail-count-postfix']: > > @value=3 > > Mar 10 13:37:20 [7415] HWJ-626.domain.local cib: info: > > cib_process_request: Completed cib_modify operation for section status: > OK > > (rc=0, origin=mail2/attrd/400, version=0.197.17) > > Mar 10 13:37:20 [7420] HWJ-626.domain.local crmd: info: > > abort_transition_graph: Transition aborted by > > status-1-fail-count-postfix, fail-count-postfix=3: Transient attribute > > change (modify cib=0.197.17, source=abort_unless_down:319, > > > path=/cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']/nvpair[@id='status-1-fail-count-postfix'], > > 1) > > Mar 10 13:37:20 [7418] HWJ-626.domain.local attrd: info: > > attrd_cib_callback: Update 400 for fail-count-postfix: OK (0) > > Mar 10 13:37:20 [7418] HWJ-626.domain.local attrd: info: > > attrd_cib_callback: Update 400 for fail-count-postfix[mail1]=3: OK (0) > > Mar 10 13:37:20 [7418] HWJ-626.domain.local attrd: info: > > attrd_cib_callback: Update 400 for fail-count-postfix[mail2]=(null): OK > > (0) > > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > > cib_process_request: Forwarding cib_modify operation for section status > to > > master (origin=local/attrd/401) > > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > > cib_perform_op: Diff: --- 0.197.17 2 > > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > > cib_perform_op: Diff: +++ 0.197.18 (null) > > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > > cib_perform_op: + /cib: @num_updates=18 > > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > > cib_perform_op: + > > > /cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']/nvpair[@id='status-1-last-failure-postfix']: > > @value=1457613440 > > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > > cib_process_request: Completed cib_modify operation for section status: > OK > > (rc=0, origin=mail2/attrd/401, version=0.197.18) > > Mar 10 13:37:21 [7418] HWJ-626.domain.local attrd: info: > > attrd_cib_callback: Update 401 for last-failure-postfix: OK (0) > > Mar 10 13:37:21 [7418] HWJ-626.domain.local attrd: info: > > attrd_cib_callback: Update 401 for > > last-failure-postfix[mail1]=1457613440: OK (0) > > Mar 10 13:37:21 [7418] HWJ-626.domain.local attrd: info: > > attrd_cib_callback: Update 401 for > > last-failure-postfix[mail2]=1457610376: OK (0) > > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info: > > abort_transition_graph: Transition aborted by > > status-1-last-failure-postfix, last-failure-postfix=1457613440: Transient > > attribute change (modify cib=0.197.18, source=abort_unless_down:319, > > > path=/cib/status/node_state[@id='1']/transient_attributes[@id='1']/instance_attributes[@id='status-1']/nvpair[@id='status-1-last-failure-postfix'], > > 1) > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice: > > unpack_config: On loss of CCM Quorum: Ignore > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > determine_online_status: Node mail1 is online > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > determine_online_status: Node mail2 is online > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > determine_op_status: Operation monitor found resource mail:0 active in > > master mode on mail1 > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > determine_op_status: Operation monitor found resource spool:0 active in > > master mode on mail1 > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > determine_op_status: Operation monitor found resource fs-spool active on > > mail1 > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > determine_op_status: Operation monitor found resource fs-mail active on > > mail1 > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: warning: > > unpack_rsc_op_failure: Processing failed op monitor for postfix on > > mail1: not running (7) > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > determine_op_status: Operation monitor found resource spool:1 active in > > master mode on mail2 > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > determine_op_status: Operation monitor found resource mail:1 active in > > master mode on mail2 > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > group_print: Resource Group: network-services > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > native_print: virtualip-1 (ocf::heartbeat:IPaddr2): > Started > > mail1 > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > clone_print: Master/Slave Set: spool-clone [spool] > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > short_print: Masters: [ mail1 ] > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > short_print: Slaves: [ mail2 ] > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > clone_print: Master/Slave Set: mail-clone [mail] > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > short_print: Masters: [ mail1 ] > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > short_print: Slaves: [ mail2 ] > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > group_print: Resource Group: fs-services > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > native_print: fs-spool (ocf::heartbeat:Filesystem): Started > mail1 > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > native_print: fs-mail (ocf::heartbeat:Filesystem): Started > mail1 > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > group_print: Resource Group: mail-services > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > native_print: postfix (ocf::heartbeat:postfix): FAILED > mail1 > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > get_failcount_full: postfix has failed 3 times on mail1 > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: warning: > > common_apply_stickiness: Forcing postfix away from mail1 after 3 > > failures (max=3) > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > master_color: Promoting mail:0 (Master mail1) > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > master_color: mail-clone: Promoted 1 instances of a possible 1 to master > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > master_color: Promoting spool:0 (Master mail1) > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > master_color: spool-clone: Promoted 1 instances of a possible 1 to master > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > rsc_merge_weights: postfix: Rolling back scores from virtualip-1 > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > native_color: Resource virtualip-1 cannot run anywhere > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > RecurringOp: Start recurring monitor (45s) for postfix on mail2 > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice: > > LogActions: Stop virtualip-1 (mail1) > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > LogActions: Leave spool:0 (Master mail1) > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > LogActions: Leave spool:1 (Slave mail2) > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > LogActions: Leave mail:0 (Master mail1) > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > LogActions: Leave mail:1 (Slave mail2) > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice: > > LogActions: Stop fs-spool (Started mail1) > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice: > > LogActions: Stop fs-mail (Started mail1) > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice: > > LogActions: Stop postfix (Started mail1) > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice: > > process_pe_message: Calculated Transition 1235: > > /var/lib/pacemaker/pengine/pe-input-302.bz2 > > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info: > > handle_response: pe_calc calculation pe_calc-dc-1457613441-3756 is > > obsolete > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice: > > unpack_config: On loss of CCM Quorum: Ignore > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > determine_online_status: Node mail1 is online > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > determine_online_status: Node mail2 is online > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > determine_op_status: Operation monitor found resource mail:0 active in > > master mode on mail1 > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > determine_op_status: Operation monitor found resource spool:0 active in > > master mode on mail1 > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > determine_op_status: Operation monitor found resource fs-spool active on > > mail1 > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > determine_op_status: Operation monitor found resource fs-mail active on > > mail1 > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: warning: > > unpack_rsc_op_failure: Processing failed op monitor for postfix on > > mail1: not running (7) > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > determine_op_status: Operation monitor found resource spool:1 active in > > master mode on mail2 > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > determine_op_status: Operation monitor found resource mail:1 active in > > master mode on mail2 > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > group_print: Resource Group: network-services > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > native_print: virtualip-1 (ocf::heartbeat:IPaddr2): > Started > > mail1 > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > clone_print: Master/Slave Set: spool-clone [spool] > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > short_print: Masters: [ mail1 ] > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > short_print: Slaves: [ mail2 ] > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > clone_print: Master/Slave Set: mail-clone [mail] > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > short_print: Masters: [ mail1 ] > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > short_print: Slaves: [ mail2 ] > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > group_print: Resource Group: fs-services > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > native_print: fs-spool (ocf::heartbeat:Filesystem): Started > mail1 > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > native_print: fs-mail (ocf::heartbeat:Filesystem): Started > mail1 > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > group_print: Resource Group: mail-services > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > native_print: postfix (ocf::heartbeat:postfix): FAILED > mail1 > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > get_failcount_full: postfix has failed 3 times on mail1 > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: warning: > > common_apply_stickiness: Forcing postfix away from mail1 after 3 > > failures (max=3) > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > master_color: Promoting mail:0 (Master mail1) > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > master_color: mail-clone: Promoted 1 instances of a possible 1 to master > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > master_color: Promoting spool:0 (Master mail1) > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > master_color: spool-clone: Promoted 1 instances of a possible 1 to master > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > rsc_merge_weights: postfix: Rolling back scores from virtualip-1 > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > native_color: Resource virtualip-1 cannot run anywhere > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > RecurringOp: Start recurring monitor (45s) for postfix on mail2 > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice: > > LogActions: Stop virtualip-1 (mail1) > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > LogActions: Leave spool:0 (Master mail1) > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > LogActions: Leave spool:1 (Slave mail2) > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > LogActions: Leave mail:0 (Master mail1) > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: info: > > LogActions: Leave mail:1 (Slave mail2) > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice: > > LogActions: Stop fs-spool (Started mail1) > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice: > > LogActions: Stop fs-mail (Started mail1) > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice: > > LogActions: Stop postfix (Started mail1) > > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info: > > do_state_transition: State transition S_POLICY_ENGINE -> > > S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE > > origin=handle_response ] > > Mar 10 13:37:21 [7419] HWJ-626.domain.local pengine: notice: > > process_pe_message: Calculated Transition 1236: > > /var/lib/pacemaker/pengine/pe-input-303.bz2 > > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info: > > do_te_invoke: Processing graph 1236 (ref=pe_calc-dc-1457613441-3757) > > derived from /var/lib/pacemaker/pengine/pe-input-303.bz2 > > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice: > > te_rsc_command: Initiating action 12: stop virtualip-1_stop_0 on > mail1 > > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice: > > te_rsc_command: Initiating action 5: stop postfix_stop_0 on mail1 > > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > > cib_perform_op: Diff: --- 0.197.18 2 > > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > > cib_perform_op: Diff: +++ 0.197.19 (null) > > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > > cib_perform_op: + /cib: @num_updates=19 > > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > > cib_perform_op: + > > > /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='virtualip-1']/lrm_rsc_op[@id='virtualip-1_last_0']: > > @operation_key=virtualip-1_stop_0, @operation=stop, > > @transition-key=12:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a, > > @transition-magic=0:0;12:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a, > > @call-id=1276, @last-run=1457613441, @last-rc-change=1457613441, > > @exec-time=66 > > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > > cib_process_request: Completed cib_modify operation for section status: > OK > > (rc=0, origin=mail1/crmd/197, version=0.197.19) > > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info: > > match_graph_event: Action virtualip-1_stop_0 (12) confirmed on mail1 > > (rc=0) > > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > > cib_perform_op: Diff: --- 0.197.19 2 > > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > > cib_perform_op: Diff: +++ 0.197.20 (null) > > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > > cib_perform_op: + /cib: @num_updates=20 > > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > > cib_perform_op: + > > > /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='postfix']/lrm_rsc_op[@id='postfix_last_0']: > > @operation_key=postfix_stop_0, @operation=stop, > > @transition-key=5:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a, > > @transition-magic=0:0;5:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a, > > @call-id=1278, @last-run=1457613441, @last-rc-change=1457613441, > > @exec-time=476 > > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info: > > match_graph_event: Action postfix_stop_0 (5) confirmed on mail1 (rc=0) > > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice: > > te_rsc_command: Initiating action 79: stop fs-mail_stop_0 on mail1 > > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > > cib_process_request: Completed cib_modify operation for section status: > OK > > (rc=0, origin=mail1/crmd/198, version=0.197.20) > > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > > cib_perform_op: Diff: --- 0.197.20 2 > > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > > cib_perform_op: Diff: +++ 0.197.21 (null) > > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > > cib_perform_op: + /cib: @num_updates=21 > > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > > cib_perform_op: + > > > /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='fs-mail']/lrm_rsc_op[@id='fs-mail_last_0']: > > @operation_key=fs-mail_stop_0, @operation=stop, > > @transition-key=79:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a, > > @transition-magic=0:0;79:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a, > > @call-id=1280, @last-run=1457613441, @last-rc-change=1457613441, > > @exec-time=88, @queue-time=1 > > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > > cib_process_request: Completed cib_modify operation for section status: > OK > > (rc=0, origin=mail1/crmd/199, version=0.197.21) > > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info: > > match_graph_event: Action fs-mail_stop_0 (79) confirmed on mail1 > (rc=0) > > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice: > > te_rsc_command: Initiating action 77: stop fs-spool_stop_0 on mail1 > > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > > cib_perform_op: Diff: --- 0.197.21 2 > > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > > cib_perform_op: Diff: +++ 0.197.22 (null) > > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > > cib_perform_op: + /cib: @num_updates=22 > > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > > cib_perform_op: + > > > /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='fs-spool']/lrm_rsc_op[@id='fs-spool_last_0']: > > @operation_key=fs-spool_stop_0, @operation=stop, > > @transition-key=77:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a, > > @transition-magic=0:0;77:1236:0:ae755a85-c250-498f-9c94-ddd8a7e2788a, > > @call-id=1282, @last-run=1457613441, @last-rc-change=1457613441, > > @exec-time=86 > > Mar 10 13:37:21 [7415] HWJ-626.domain.local cib: info: > > cib_process_request: Completed cib_modify operation for section status: > OK > > (rc=0, origin=mail1/crmd/200, version=0.197.22) > > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info: > > match_graph_event: Action fs-spool_stop_0 (77) confirmed on mail1 > (rc=0) > > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: warning: > > run_graph: Transition 1236 (Complete=11, Pending=0, Fired=0, > Skipped=0, > > Incomplete=1, Source=/var/lib/pacemaker/pengine/pe-input-303.bz2): > > Terminated > > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: warning: > > te_graph_trigger: Transition failed: terminated > > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice: > > print_graph: Graph 1236 with 12 actions: batch-limit=12 jobs, > > network-delay=0ms > > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice: > > print_synapse: [Action 16]: Completed pseudo op > > network-services_stopped_0 on N/A (priority: 0, waiting: none) > > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice: > > print_synapse: [Action 15]: Completed pseudo op > > network-services_stop_0 on N/A (priority: 0, waiting: none) > > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice: > > print_synapse: [Action 12]: Completed rsc op virtualip-1_stop_0 > > on mail1 (priority: 0, waiting: none) > > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice: > > print_synapse: [Action 84]: Completed pseudo op > > fs-services_stopped_0 on N/A (priority: 0, waiting: none) > > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice: > > print_synapse: [Action 83]: Completed pseudo op > fs-services_stop_0 > > on N/A (priority: 0, waiting: none) > > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice: > > print_synapse: [Action 77]: Completed rsc op fs-spool_stop_0 > > on mail1 (priority: 0, waiting: none) > > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice: > > print_synapse: [Action 79]: Completed rsc op fs-mail_stop_0 > > on mail1 (priority: 0, waiting: none) > > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice: > > print_synapse: [Action 90]: Completed pseudo op > > mail-services_stopped_0 on N/A (priority: 0, waiting: none) > > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice: > > print_synapse: [Action 89]: Completed pseudo op > > mail-services_stop_0 on N/A (priority: 0, waiting: none) > > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice: > > print_synapse: [Action 86]: Pending rsc op postfix_monitor_45000 > > on mail2 (priority: 0, waiting: none) > > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice: > > print_synapse: * [Input 85]: Unresolved dependency rsc op > > postfix_start_0 on mail2 > > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice: > > print_synapse: [Action 5]: Completed rsc op postfix_stop_0 > > on mail1 (priority: 0, waiting: none) > > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice: > > print_synapse: [Action 8]: Completed pseudo op all_stopped > > on N/A (priority: 0, waiting: none) > > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: info: do_log: > > FSA: Input I_TE_SUCCESS from notify_crmd() received in state > > S_TRANSITION_ENGINE > > Mar 10 13:37:21 [7420] HWJ-626.domain.local crmd: notice: > > do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ > > input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] > > Mar 10 13:37:26 [7415] HWJ-626.domain.local cib: info: > > cib_process_ping: Reporting our current digest to mail2: > > 3896ee29cdb6ba128330b0ef6e41bd79 for 0.197.22 (0x1544a30 0) > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org >
_______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org