On 27/09/15 11:02 AM, Alex Crow wrote: > > > On 27/09/15 15:54, Digimer wrote: >> On 27/09/15 10:40 AM, Alex Crow wrote: >>> Hi List, >>> >>> I'm trying to set up a failover iSCSI storage system for oVirt using a >>> self-hosted engine. I've set up DRBD in Master-Slave for two iSCSI >>> targets, one for the self-hosted engine and one for the VMs. I has this >>> all working perfectly, then after trying to move the engine's LUN to the >>> opposite host, all hell broke loose. The VMS LUN is still fine, starts >> I'm guessing no fencing? > > Hi Digimer, > > No, but I've tried turning off one machine and still no success as a > single node :-(
You *must* have working fencing, anyway. So now strikes me as a fantastic time to add it. Turning off the node alone doesn't help. >>> and migrates as it should. However the engine LUN always seems to try to >>> launch the target on the host that is *NOT* the master of the DRBD >>> resource. My constraints look fine, and should be self-explanatory about >>> which is which: >>> >>> [root@granby ~]# pcs constraint --full >>> Location Constraints: >>> Ordering Constraints: >>> promote drbd-vms-iscsi then start iscsi-vms-ip (kind:Mandatory) >>> (id:vm_iscsi_ip_after_drbd) >>> start iscsi-vms-target then start iscsi-vms-lun (kind:Mandatory) >>> (id:vms_lun_after_target) >>> promote drbd-vms-iscsi then start iscsi-vms-target (kind:Mandatory) >>> (id:vms_target_after_drbd) >>> promote drbd-engine-iscsi then start iscsi-engine-ip (kind:Mandatory) >>> (id:ip_after_drbd) >>> start iscsi-engine-target then start iscsi-engine-lun >>> (kind:Mandatory) >>> (id:lun_after_target) >>> promote drbd-engine-iscsi then start iscsi-engine-target >>> (kind:Mandatory) (id:target_after_drbd) >>> Colocation Constraints: >>> iscsi-vms-ip with drbd-vms-iscsi (score:INFINITY) (rsc-role:Started) >>> (with-rsc-role:Master) (id:vms_ip-with-drbd) >>> iscsi-vms-lun with drbd-vms-iscsi (score:INFINITY) (rsc-role:Started) >>> (with-rsc-role:Master) (id:vms_lun-with-drbd) >>> iscsi-vms-target with drbd-vms-iscsi (score:INFINITY) >>> (rsc-role:Started) (with-rsc-role:Master) (id:vms_target-with-drbd) >>> iscsi-engine-ip with drbd-engine-iscsi (score:INFINITY) >>> (rsc-role:Started) (with-rsc-role:Master) (id:ip-with-drbd) >>> iscsi-engine-lun with drbd-engine-iscsi (score:INFINITY) >>> (rsc-role:Started) (with-rsc-role:Master) (id:lun-with-drbd) >>> iscsi-engine-target with drbd-engine-iscsi (score:INFINITY) >>> (rsc-role:Started) (with-rsc-role:Master) (id:target-with-drbd) >>> >>> But see this from pcs status, the iSCSI target has FAILED on glenrock, >>> but the DRBD master is on granby!: >>> >>> [root@granby ~]# pcs status >>> Cluster name: storage >>> Last updated: Sun Sep 27 15:30:08 2015 >>> Last change: Sun Sep 27 15:20:58 2015 >>> Stack: cman >>> Current DC: glenrock - partition with quorum >>> Version: 1.1.11-97629de >>> 2 Nodes configured >>> 10 Resources configured >>> >>> >>> Online: [ glenrock granby ] >>> >>> Full list of resources: >>> >>> Master/Slave Set: drbd-vms-iscsi [drbd-vms] >>> Masters: [ glenrock ] >>> Slaves: [ granby ] >>> iscsi-vms-target (ocf::heartbeat:iSCSITarget): Started glenrock >>> iscsi-vms-lun (ocf::heartbeat:iSCSILogicalUnit): Started glenrock >>> iscsi-vms-ip (ocf::heartbeat:IPaddr2): Started glenrock >>> Master/Slave Set: drbd-engine-iscsi [drbd-engine] >>> Masters: [ granby ] >>> Slaves: [ glenrock ] >>> iscsi-engine-target (ocf::heartbeat:iSCSITarget): FAILED glenrock >>> (unmanaged) >>> iscsi-engine-ip (ocf::heartbeat:IPaddr2): Stopped >>> iscsi-engine-lun (ocf::heartbeat:iSCSILogicalUnit): Stopped >>> >>> Failed actions: >>> iscsi-engine-target_stop_0 on glenrock 'unknown error' (1): >>> call=177, status=Timed Out, last-rc-change='Sun Sep 27 15:20:59 2015', >>> queued=0ms, exec=10003ms >>> iscsi-engine-target_stop_0 on glenrock 'unknown error' (1): >>> call=177, status=Timed Out, last-rc-change='Sun Sep 27 15:20:59 2015', >>> queued=0ms, exec=10003ms >>> >>> I have tried various combinations of pcs resource clear and cleanup, but >>> that all result in the same outcome - apart from on some occasions when >>> one or other of the two hosts suddenly reboots! >>> >>> Here is a log right after a "pcs resource cleanup" - first on the master >>> for the DRBD m/s resource: >>> [root@granby ~]# pcs resource cleanup; tail -f /var/log/messages >>> All resources/stonith devices successfully cleaned up >>> Sep 27 15:33:42 granby crmd[3358]: notice: process_lrm_event: >>> granby-drbd-engine_monitor_0:117 [ \n ] >>> Sep 27 15:33:42 granby attrd[3356]: notice: attrd_trigger_update: >>> Sending flush op to all hosts for: probe_complete (true) >>> Sep 27 15:33:42 granby attrd[3356]: notice: attrd_perform_update: Sent >>> update 54: probe_complete=true >>> Sep 27 15:33:42 granby crmd[3358]: notice: process_lrm_event: >>> Operation drbd-engine_monitor_10000: master (node=granby, call=131, >>> rc=8, cib-update=83, confirmed=false) >>> Sep 27 15:33:42 granby crmd[3358]: notice: process_lrm_event: >>> granby-drbd-engine_monitor_10000:131 [ \n ] >>> Sep 27 15:33:42 granby crmd[3358]: notice: process_lrm_event: >>> Operation drbd-vms_monitor_20000: ok (node=granby, call=130, rc=0, >>> cib-update=84, confirmed=false) >>> Sep 27 15:34:46 granby crmd[3358]: notice: do_lrm_invoke: Forcing the >>> status of all resources to be redetected >>> Sep 27 15:34:46 granby attrd[3356]: notice: attrd_trigger_update: >>> Sending flush op to all hosts for: probe_complete (<null>) >>> Sep 27 15:34:46 granby crmd[3358]: notice: process_lrm_event: >>> granby-drbd-engine_monitor_10000:131 [ \n ] >>> Sep 27 15:34:46 granby attrd[3356]: notice: attrd_perform_update: Sent >>> delete 61: node=granby, attr=probe_complete, id=<n/a>, set=(null), >>> section=status >>> Sep 27 15:34:46 granby attrd[3356]: notice: attrd_perform_update: Sent >>> delete 63: node=granby, attr=probe_complete, id=<n/a>, set=(null), >>> section=status >>> Sep 27 15:34:46 granby crmd[3358]: notice: process_lrm_event: >>> Operation iscsi-vms-target_monitor_0: not running (node=granby, >>> call=150, rc=7, cib-update=94, confirmed=true) >>> Sep 27 15:34:46 granby crmd[3358]: notice: process_lrm_event: >>> Operation iscsi-vms-lun_monitor_0: not running (node=granby, call=154, >>> rc=7, cib-update=95, confirmed=true) >>> Sep 27 15:34:46 granby crmd[3358]: notice: process_lrm_event: >>> Operation iscsi-engine-target_monitor_0: not running (node=granby, >>> call=167, rc=7, cib-update=96, confirmed=true) >>> Sep 27 15:34:46 granby crmd[3358]: notice: process_lrm_event: >>> Operation iscsi-engine-lun_monitor_0: not running (node=granby, >>> call=175, rc=7, cib-update=97, confirmed=true) >>> Sep 27 15:34:46 granby crmd[3358]: notice: process_lrm_event: >>> Operation iscsi-vms-ip_monitor_0: not running (node=granby, call=158, >>> rc=7, cib-update=98, confirmed=true) >>> Sep 27 15:34:46 granby crmd[3358]: notice: process_lrm_event: >>> Operation iscsi-engine-ip_monitor_0: not running (node=granby, call=171, >>> rc=7, cib-update=99, confirmed=true) >>> Sep 27 15:34:46 granby crmd[3358]: notice: process_lrm_event: >>> Operation drbd-vms_monitor_0: ok (node=granby, call=146, rc=0, >>> cib-update=100, confirmed=true) >>> Sep 27 15:34:46 granby crmd[3358]: notice: process_lrm_event: >>> Operation drbd-engine_monitor_0: master (node=granby, call=163, rc=8, >>> cib-update=101, confirmed=true) >>> Sep 27 15:34:46 granby crmd[3358]: notice: process_lrm_event: >>> granby-drbd-engine_monitor_0:163 [ \n ] >>> Sep 27 15:34:46 granby attrd[3356]: notice: attrd_trigger_update: >>> Sending flush op to all hosts for: probe_complete (true) >>> Sep 27 15:34:46 granby attrd[3356]: notice: attrd_perform_update: Sent >>> update 67: probe_complete=true >>> Sep 27 15:34:47 granby crmd[3358]: notice: process_lrm_event: >>> Operation drbd-vms_monitor_20000: ok (node=granby, call=176, rc=0, >>> cib-update=102, confirmed=false) >>> Sep 27 15:34:47 granby crmd[3358]: notice: process_lrm_event: >>> Operation drbd-engine_monitor_10000: master (node=granby, call=177, >>> rc=8, cib-update=103, confirmed=false) >>> Sep 27 15:34:47 granby crmd[3358]: notice: process_lrm_event: >>> granby-drbd-engine_monitor_10000:177 [ \n ] >>> >>> Now on the slave: >>> >>> >>> [root@glenrock ~]# pcs resource cleanup; tail -f /var/log/messages >>> All resources/stonith devices successfully cleaned up >>> Sep 27 15:34:57 glenrock pengine[3365]: warning: unpack_rsc_op_failure: >>> Processing failed op stop for iscsi-engine-target on glenrock: unknown >>> error (1) >>> Sep 27 15:34:57 glenrock pengine[3365]: warning: >>> common_apply_stickiness: Forcing iscsi-engine-target away from glenrock >>> after 1000000 failures (max=1000000) >>> Sep 27 15:34:57 glenrock pengine[3365]: notice: process_pe_message: >>> Calculated Transition 50: /var/lib/pacemaker/pengine/pe-input-533.bz2 >>> Sep 27 15:34:57 glenrock crmd[3366]: notice: run_graph: Transition 50 >>> (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, >>> Source=/var/lib/pacemaker/pengine/pe-input-533.bz2): Complete >>> Sep 27 15:34:57 glenrock crmd[3366]: notice: do_state_transition: >>> State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS >>> cause=C_FSA_INTERNAL origin=notify_crmd ] >>> Sep 27 15:35:49 glenrock crmd[3366]: notice: do_lrm_invoke: Forcing >>> the status of all resources to be redetected >>> Sep 27 15:35:49 glenrock attrd[3364]: notice: attrd_trigger_update: >>> Sending flush op to all hosts for: probe_complete (<null>) >>> Sep 27 15:35:49 glenrock crmd[3366]: notice: process_lrm_event: >>> glenrock-drbd-vms_monitor_10000:278 [ \n ] >>> Sep 27 15:35:49 glenrock crmd[3366]: warning: qb_ipcs_event_sendv: >>> new_event_notification (3366-853-16): Broken pipe (32) >> Something is starting to go wrong here > > Yes, does look like it. But why does the other iSCSI resource work fine > and this one has seemingly gone mad? Not sure. Without fencing, once things go sideways, you enter an undefined state. One of the major benefits of *working* fencing is that it makes debugging issues easier. >>> Sep 27 15:35:49 glenrock crmd[3366]: notice: do_state_transition: >>> State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC >>> cause=C_FSA_INTERNAL origin=abort_transition_graph ] >>> Sep 27 15:35:49 glenrock attrd[3364]: notice: attrd_perform_update: >>> Sent delete 195: node=glenrock, attr=probe_complete, id=<n/a>, >>> set=(null), section=status >>> Sep 27 15:35:49 glenrock attrd[3364]: notice: attrd_perform_update: >>> Sent delete 197: node=glenrock, attr=probe_complete, id=<n/a>, >>> set=(null), section=status >>> Sep 27 15:35:49 glenrock pengine[3365]: notice: unpack_config: On loss >>> of CCM Quorum: Ignore >>> Sep 27 15:35:49 glenrock pengine[3365]: warning: >>> common_apply_stickiness: Forcing iscsi-engine-target away from glenrock >>> after 1000000 failures (max=1000000) >> The node should probably have been fenced at this point. > > Agreed. Then add it. >>> Sep 27 15:35:49 glenrock pengine[3365]: notice: LogActions: Start >>> drbd-vms:0#011(glenrock) >>> Sep 27 15:35:49 glenrock pengine[3365]: notice: LogActions: Start >>> drbd-vms:1#011(granby) >>> Sep 27 15:35:49 glenrock pengine[3365]: notice: LogActions: Start >>> drbd-engine:0#011(glenrock) >>> Sep 27 15:35:49 glenrock pengine[3365]: notice: LogActions: Start >>> drbd-engine:1#011(granby) >>> Sep 27 15:35:49 glenrock pengine[3365]: notice: process_pe_message: >>> Calculated Transition 51: /var/lib/pacemaker/pengine/pe-input-534.bz2 >>> Sep 27 15:35:49 glenrock crmd[3366]: notice: te_rsc_command: >>> Initiating action 4: monitor drbd-vms:0_monitor_0 on glenrock (local) >>> Sep 27 15:35:49 glenrock crmd[3366]: notice: te_rsc_command: >>> Initiating action 13: monitor drbd-vms:1_monitor_0 on granby >>> Sep 27 15:35:49 glenrock crmd[3366]: notice: te_rsc_command: >>> Initiating action 14: monitor iscsi-vms-target_monitor_0 on granby >>> Sep 27 15:35:49 glenrock crmd[3366]: notice: te_rsc_command: >>> Initiating action 5: monitor iscsi-vms-target_monitor_0 on glenrock >>> (local) >>> Sep 27 15:35:49 glenrock crmd[3366]: notice: te_rsc_command: >>> Initiating action 15: monitor iscsi-vms-lun_monitor_0 on granby >>> Sep 27 15:35:49 glenrock crmd[3366]: notice: te_rsc_command: >>> Initiating action 6: monitor iscsi-vms-lun_monitor_0 on glenrock (local) >>> Sep 27 15:35:49 glenrock crmd[3366]: notice: te_rsc_command: >>> Initiating action 16: monitor iscsi-vms-ip_monitor_0 on granby >>> Sep 27 15:35:49 glenrock crmd[3366]: notice: te_rsc_command: >>> Initiating action 7: monitor iscsi-vms-ip_monitor_0 on glenrock (local) >>> Sep 27 15:35:49 glenrock crmd[3366]: notice: te_rsc_command: >>> Initiating action 8: monitor drbd-engine:0_monitor_0 on glenrock (local) >>> Sep 27 15:35:49 glenrock crmd[3366]: notice: te_rsc_command: >>> Initiating action 17: monitor drbd-engine:1_monitor_0 on granby >>> Sep 27 15:35:49 glenrock crmd[3366]: notice: te_rsc_command: >>> Initiating action 18: monitor iscsi-engine-target_monitor_0 on granby >>> Sep 27 15:35:49 glenrock crmd[3366]: notice: te_rsc_command: >>> Initiating action 9: monitor iscsi-engine-target_monitor_0 on glenrock >>> (local) >>> Sep 27 15:35:49 glenrock crmd[3366]: notice: te_rsc_command: >>> Initiating action 19: monitor iscsi-engine-ip_monitor_0 on granby >>> Sep 27 15:35:49 glenrock crmd[3366]: notice: te_rsc_command: >>> Initiating action 10: monitor iscsi-engine-ip_monitor_0 on glenrock >>> (local) >>> Sep 27 15:35:49 glenrock crmd[3366]: notice: te_rsc_command: >>> Initiating action 20: monitor iscsi-engine-lun_monitor_0 on granby >>> Sep 27 15:35:49 glenrock crmd[3366]: notice: te_rsc_command: >>> Initiating action 11: monitor iscsi-engine-lun_monitor_0 on glenrock >>> (local) >>> Sep 27 15:35:49 glenrock crmd[3366]: notice: process_lrm_event: >>> Operation iscsi-vms-target_monitor_0: ok (node=glenrock, call=305, rc=0, >>> cib-update=377, confirmed=true) >>> Sep 27 15:35:49 glenrock crmd[3366]: warning: status_from_rc: Action 5 >>> (iscsi-vms-target_monitor_0) on glenrock failed (target: 7 vs. rc: 0): >>> Error >>> Sep 27 15:35:49 glenrock crmd[3366]: notice: abort_transition_graph: >>> Transition aborted by iscsi-vms-target_monitor_0 'create' on (null): >>> Event failed (magic=0:0;5:51:7:4ac61a75-532d-45c4-b07c-eee753699adc, >>> cib=0.374.121, source=match_graph_event:344, 0) >>> Sep 27 15:35:49 glenrock crmd[3366]: warning: status_from_rc: Action 5 >>> (iscsi-vms-target_monitor_0) on glenrock failed (target: 7 vs. rc: 0): >>> Error >>> Sep 27 15:35:49 glenrock crmd[3366]: notice: process_lrm_event: >>> Operation iscsi-vms-lun_monitor_0: ok (node=glenrock, call=309, rc=0, >>> cib-update=378, confirmed=true) >>> Sep 27 15:35:49 glenrock crmd[3366]: notice: process_lrm_event: >>> Operation iscsi-engine-target_monitor_0: ok (node=glenrock, call=322, >>> rc=0, cib-update=379, confirmed=true) >>> Sep 27 15:35:49 glenrock crmd[3366]: warning: status_from_rc: Action 6 >>> (iscsi-vms-lun_monitor_0) on glenrock failed (target: 7 vs. rc: 0): >>> Error >>> Sep 27 15:35:49 glenrock crmd[3366]: warning: status_from_rc: Action 6 >>> (iscsi-vms-lun_monitor_0) on glenrock failed (target: 7 vs. rc: 0): >>> Error >>> Sep 27 15:35:49 glenrock crmd[3366]: warning: status_from_rc: Action 9 >>> (iscsi-engine-target_monitor_0) on glenrock failed (target: 7 vs. rc: >>> 0): Error >>> Sep 27 15:35:49 glenrock crmd[3366]: warning: status_from_rc: Action 9 >>> (iscsi-engine-target_monitor_0) on glenrock failed (target: 7 vs. rc: >>> 0): Error >>> Sep 27 15:35:49 glenrock crmd[3366]: notice: process_lrm_event: >>> Operation iscsi-engine-lun_monitor_0: not running (node=glenrock, >>> call=330, rc=7, cib-update=380, confirmed=true) >>> Sep 27 15:35:49 glenrock crmd[3366]: notice: process_lrm_event: >>> Operation iscsi-engine-ip_monitor_0: not running (node=glenrock, >>> call=326, rc=7, cib-update=381, confirmed=true) >>> Sep 27 15:35:49 glenrock crmd[3366]: notice: process_lrm_event: >>> Operation iscsi-vms-ip_monitor_0: ok (node=glenrock, call=313, rc=0, >>> cib-update=382, confirmed=true) >>> Sep 27 15:35:49 glenrock crmd[3366]: warning: status_from_rc: Action 13 >>> (drbd-vms:1_monitor_0) on granby failed (target: 7 vs. rc: 0): Error >>> Sep 27 15:35:49 glenrock crmd[3366]: warning: status_from_rc: Action 13 >>> (drbd-vms:1_monitor_0) on granby failed (target: 7 vs. rc: 0): Error >>> Sep 27 15:35:49 glenrock crmd[3366]: warning: status_from_rc: Action 17 >>> (drbd-engine:1_monitor_0) on granby failed (target: 7 vs. rc: 8): Error >>> Sep 27 15:35:49 glenrock crmd[3366]: warning: status_from_rc: Action 17 >>> (drbd-engine:1_monitor_0) on granby failed (target: 7 vs. rc: 8): Error >>> Sep 27 15:35:49 glenrock crmd[3366]: notice: te_rsc_command: >>> Initiating action 12: probe_complete probe_complete-granby on granby - >>> no waiting >>> Sep 27 15:35:49 glenrock crmd[3366]: warning: status_from_rc: Action 7 >>> (iscsi-vms-ip_monitor_0) on glenrock failed (target: 7 vs. rc: 0): Error >>> Sep 27 15:35:49 glenrock crmd[3366]: warning: status_from_rc: Action 7 >>> (iscsi-vms-ip_monitor_0) on glenrock failed (target: 7 vs. rc: 0): Error >>> Sep 27 15:35:49 glenrock crmd[3366]: notice: abort_transition_graph: >>> Transition aborted by status-granby-probe_complete, probe_complete=true: >>> Transient attribute change (create cib=0.374.135, >>> source=te_update_diff:391, >>> path=/cib/status/node_state[@id='granby']/transient_attributes[@id='granby']/instance_attributes[@id='status-granby'], >>> >>> 0) >>> Sep 27 15:35:49 glenrock crmd[3366]: notice: process_lrm_event: >>> Operation drbd-engine_monitor_0: ok (node=glenrock, call=318, rc=0, >>> cib-update=383, confirmed=true) >>> Sep 27 15:35:49 glenrock crmd[3366]: notice: process_lrm_event: >>> Operation drbd-vms_monitor_0: master (node=glenrock, call=301, rc=8, >>> cib-update=384, confirmed=true) >>> Sep 27 15:35:49 glenrock crmd[3366]: notice: process_lrm_event: >>> glenrock-drbd-vms_monitor_0:301 [ \n ] >>> Sep 27 15:35:49 glenrock crmd[3366]: warning: status_from_rc: Action 8 >>> (drbd-engine:0_monitor_0) on glenrock failed (target: 7 vs. rc: 0): >>> Error >>> Sep 27 15:35:49 glenrock crmd[3366]: warning: status_from_rc: Action 8 >>> (drbd-engine:0_monitor_0) on glenrock failed (target: 7 vs. rc: 0): >>> Error >>> Sep 27 15:35:49 glenrock crmd[3366]: warning: status_from_rc: Action 4 >>> (drbd-vms:0_monitor_0) on glenrock failed (target: 7 vs. rc: 8): Error >>> Sep 27 15:35:49 glenrock crmd[3366]: warning: status_from_rc: Action 4 >>> (drbd-vms:0_monitor_0) on glenrock failed (target: 7 vs. rc: 8): Error >>> Sep 27 15:35:49 glenrock crmd[3366]: notice: te_rsc_command: >>> Initiating action 3: probe_complete probe_complete-glenrock on glenrock >>> (local) - no waiting >>> Sep 27 15:35:49 glenrock attrd[3364]: notice: attrd_trigger_update: >>> Sending flush op to all hosts for: probe_complete (true) >>> Sep 27 15:35:49 glenrock crmd[3366]: notice: run_graph: Transition 51 >>> (Complete=24, Pending=0, Fired=0, Skipped=9, Incomplete=10, >>> Source=/var/lib/pacemaker/pengine/pe-input-534.bz2): Stopped >>> Sep 27 15:35:49 glenrock attrd[3364]: notice: attrd_perform_update: >>> Sent update 201: probe_complete=true >>> Sep 27 15:35:49 glenrock pengine[3365]: notice: unpack_config: On loss >>> of CCM Quorum: Ignore >>> Sep 27 15:35:49 glenrock pengine[3365]: warning: >>> common_apply_stickiness: Forcing iscsi-engine-target away from glenrock >>> after 1000000 failures (max=1000000) >>> Sep 27 15:35:49 glenrock pengine[3365]: notice: LogActions: Move >>> iscsi-engine-target#011(Started glenrock -> granby) >>> Sep 27 15:35:49 glenrock pengine[3365]: notice: process_pe_message: >>> Calculated Transition 52: /var/lib/pacemaker/pengine/pe-input-535.bz2 >>> Sep 27 15:35:49 glenrock pengine[3365]: notice: unpack_config: On loss >>> of CCM Quorum: Ignore >>> Sep 27 15:35:49 glenrock pengine[3365]: warning: >>> common_apply_stickiness: Forcing iscsi-engine-target away from glenrock >>> after 1000000 failures (max=1000000) >>> Sep 27 15:35:49 glenrock pengine[3365]: notice: LogActions: Move >>> iscsi-engine-target#011(Started glenrock -> granby) >>> Sep 27 15:35:49 glenrock pengine[3365]: notice: process_pe_message: >>> Calculated Transition 53: /var/lib/pacemaker/pengine/pe-input-536.bz2 >>> Sep 27 15:35:49 glenrock crmd[3366]: notice: te_rsc_command: >>> Initiating action 7: monitor drbd-vms_monitor_20000 on granby >>> Sep 27 15:35:49 glenrock crmd[3366]: notice: te_rsc_command: >>> Initiating action 12: monitor drbd-vms_monitor_10000 on glenrock (local) >>> Sep 27 15:35:49 glenrock crmd[3366]: notice: te_rsc_command: >>> Initiating action 39: monitor iscsi-vms-target_monitor_10000 on glenrock >>> (local) >>> Sep 27 15:35:49 glenrock crmd[3366]: notice: te_rsc_command: >>> Initiating action 42: monitor iscsi-vms-lun_monitor_10000 on glenrock >>> (local) >>> Sep 27 15:35:49 glenrock crmd[3366]: notice: te_rsc_command: >>> Initiating action 45: monitor iscsi-vms-ip_monitor_10000 on glenrock >>> (local) >>> Sep 27 15:35:49 glenrock crmd[3366]: notice: te_rsc_command: >>> Initiating action 50: monitor drbd-engine_monitor_10000 on granby >>> Sep 27 15:35:49 glenrock crmd[3366]: notice: te_rsc_command: >>> Initiating action 53: monitor drbd-engine_monitor_20000 on glenrock >>> (local) >>> Sep 27 15:35:49 glenrock crmd[3366]: notice: te_rsc_command: >>> Initiating action 78: stop iscsi-engine-target_stop_0 on glenrock >>> (local) >>> Sep 27 15:35:49 glenrock crmd[3366]: notice: process_lrm_event: >>> Operation iscsi-vms-target_monitor_10000: ok (node=glenrock, call=332, >>> rc=0, cib-update=387, confirmed=false) >>> Sep 27 15:35:49 glenrock crmd[3366]: notice: process_lrm_event: >>> Operation iscsi-vms-lun_monitor_10000: ok (node=glenrock, call=333, >>> rc=0, cib-update=388, confirmed=false) >>> Sep 27 15:35:49 glenrock iSCSITarget(iscsi-engine-target)[1079]: >>> WARNING: Failed to remove target iqn.2015-09.integralife.net, retrying. >>> Sep 27 15:35:49 glenrock crmd[3366]: notice: process_lrm_event: >>> Operation iscsi-vms-ip_monitor_10000: ok (node=glenrock, call=334, rc=0, >>> cib-update=389, confirmed=false) >>> Sep 27 15:35:49 glenrock crmd[3366]: notice: process_lrm_event: >>> Operation drbd-vms_monitor_10000: master (node=glenrock, call=331, rc=8, >>> cib-update=390, confirmed=false) >>> Sep 27 15:35:49 glenrock crmd[3366]: notice: process_lrm_event: >>> glenrock-drbd-vms_monitor_10000:331 [ \n ] >>> Sep 27 15:35:49 glenrock crmd[3366]: notice: process_lrm_event: >>> Operation drbd-engine_monitor_20000: ok (node=glenrock, call=335, rc=0, >>> cib-update=391, confirmed=false) >>> Sep 27 15:35:50 glenrock iSCSITarget(iscsi-engine-target)[1079]: >>> WARNING: Failed to remove target iqn.2015-09.integralife.net, retrying. >>> Sep 27 15:35:51 glenrock iSCSITarget(iscsi-engine-target)[1079]: >>> WARNING: Failed to remove target iqn.2015-09.integralife.net, retrying. >>> Sep 27 15:35:52 glenrock iSCSITarget(iscsi-engine-target)[1079]: >>> WARNING: Failed to remove target iqn.2015-09.integralife.net, retrying. >>> Sep 27 15:35:53 glenrock iSCSITarget(iscsi-engine-target)[1079]: >>> WARNING: Failed to remove target iqn.2015-09.integralife.net, retrying. >>> Sep 27 15:35:54 glenrock iSCSITarget(iscsi-engine-target)[1079]: >>> WARNING: Failed to remove target iqn.2015-09.integralife.net, retrying. >>> Sep 27 15:35:55 glenrock iSCSITarget(iscsi-engine-target)[1079]: >>> WARNING: Failed to remove target iqn.2015-09.integralife.net, retrying. >>> Sep 27 15:35:56 glenrock iSCSITarget(iscsi-engine-target)[1079]: >>> WARNING: Failed to remove target iqn.2015-09.integralife.net, retrying. >>> Sep 27 15:35:57 glenrock iSCSITarget(iscsi-engine-target)[1079]: >>> WARNING: Failed to remove target iqn.2015-09.integralife.net, retrying. >>> Sep 27 15:35:58 glenrock iSCSITarget(iscsi-engine-target)[1079]: >>> WARNING: Failed to remove target iqn.2015-09.integralife.net, retrying. >>> Sep 27 15:35:59 glenrock lrmd[3363]: warning: child_timeout_callback: >>> iscsi-engine-target_stop_0 process (PID 1079) timed out >>> Sep 27 15:35:59 glenrock lrmd[3363]: warning: operation_finished: >>> iscsi-engine-target_stop_0:1079 - timed out after 10000ms >>> Sep 27 15:35:59 glenrock lrmd[3363]: notice: operation_finished: >>> iscsi-engine-target_stop_0:1079:stderr [ tgtadm: can't find the target ] >>> Sep 27 15:35:59 glenrock lrmd[3363]: notice: operation_finished: >>> iscsi-engine-target_stop_0:1079:stderr [ tgtadm: can't find the target ] >>> Sep 27 15:35:59 glenrock lrmd[3363]: notice: operation_finished: >>> iscsi-engine-target_stop_0:1079:stderr [ tgtadm: can't find the target ] >>> Sep 27 15:35:59 glenrock lrmd[3363]: notice: operation_finished: >>> iscsi-engine-target_stop_0:1079:stderr [ tgtadm: can't find the target ] >>> Sep 27 15:35:59 glenrock lrmd[3363]: notice: operation_finished: >>> iscsi-engine-target_stop_0:1079:stderr [ tgtadm: can't find the target ] >>> Sep 27 15:35:59 glenrock lrmd[3363]: notice: operation_finished: >>> iscsi-engine-target_stop_0:1079:stderr [ tgtadm: can't find the target ] >>> Sep 27 15:35:59 glenrock lrmd[3363]: notice: operation_finished: >>> iscsi-engine-target_stop_0:1079:stderr [ tgtadm: can't find the target ] >>> Sep 27 15:35:59 glenrock lrmd[3363]: notice: operation_finished: >>> iscsi-engine-target_stop_0:1079:stderr [ tgtadm: can't find the target ] >>> Sep 27 15:35:59 glenrock lrmd[3363]: notice: operation_finished: >>> iscsi-engine-target_stop_0:1079:stderr [ tgtadm: can't find the target ] >>> Sep 27 15:35:59 glenrock lrmd[3363]: notice: operation_finished: >>> iscsi-engine-target_stop_0:1079:stderr [ tgtadm: can't find the target ] >>> Sep 27 15:35:59 glenrock crmd[3366]: error: process_lrm_event: >>> Operation iscsi-engine-target_stop_0: Timed Out (node=glenrock, >>> call=336, timeout=10000ms) >>> Sep 27 15:35:59 glenrock crmd[3366]: notice: process_lrm_event: >>> glenrock-iscsi-engine-target_stop_0:336 [ tgtadm: can't find the >>> target\ntgtadm: can't find the target\ntgtadm: can't find the >>> target\ntgtadm: can't find the target\ntgtadm: can't find the >>> target\ntgtadm: can't find the target\ntgtadm: can't find the >>> target\ntgtadm: can't find the target\ntgtadm: can't find the >>> target\ntgtadm: can't find the target\n ] >>> Sep 27 15:35:59 glenrock crmd[3366]: warning: status_from_rc: Action 78 >>> (iscsi-engine-target_stop_0) on glenrock failed (target: 0 vs. rc: 1): >>> Error >>> Sep 27 15:35:59 glenrock crmd[3366]: warning: update_failcount: >>> Updating failcount for iscsi-engine-target on glenrock after failed >>> stop: rc=1 (update=INFINITY, time=1443364559) >>> Sep 27 15:35:59 glenrock crmd[3366]: notice: abort_transition_graph: >>> Transition aborted by iscsi-engine-target_stop_0 'modify' on (null): >>> Event failed (magic=2:1;78:53:0:4ac61a75-532d-45c4-b07c-eee753699adc, >>> cib=0.374.146, source=match_graph_event:344, 0) >>> Sep 27 15:35:59 glenrock attrd[3364]: notice: attrd_trigger_update: >>> Sending flush op to all hosts for: last-failure-iscsi-engine-target >>> (1443364559) >>> Sep 27 15:35:59 glenrock crmd[3366]: warning: update_failcount: >>> Updating failcount for iscsi-engine-target on glenrock after failed >>> stop: rc=1 (update=INFINITY, time=1443364559) >>> Sep 27 15:35:59 glenrock crmd[3366]: warning: status_from_rc: Action 78 >>> (iscsi-engine-target_stop_0) on glenrock failed (target: 0 vs. rc: 1): >>> Error >>> Sep 27 15:35:59 glenrock crmd[3366]: warning: update_failcount: >>> Updating failcount for iscsi-engine-target on glenrock after failed >>> stop: rc=1 (update=INFINITY, time=1443364559) >>> Sep 27 15:35:59 glenrock attrd[3364]: notice: attrd_perform_update: >>> Sent update 203: last-failure-iscsi-engine-target=1443364559 >>> Sep 27 15:35:59 glenrock crmd[3366]: warning: update_failcount: >>> Updating failcount for iscsi-engine-target on glenrock after failed >>> stop: rc=1 (update=INFINITY, time=1443364559) >>> Sep 27 15:35:59 glenrock attrd[3364]: notice: attrd_trigger_update: >>> Sending flush op to all hosts for: last-failure-iscsi-engine-target >>> (1443364559) >>> Sep 27 15:35:59 glenrock crmd[3366]: notice: run_graph: Transition 53 >>> (Complete=8, Pending=0, Fired=0, Skipped=3, Incomplete=0, >>> Source=/var/lib/pacemaker/pengine/pe-input-536.bz2): Stopped >>> Sep 27 15:35:59 glenrock attrd[3364]: notice: attrd_perform_update: >>> Sent update 205: last-failure-iscsi-engine-target=1443364559 >>> Sep 27 15:35:59 glenrock attrd[3364]: notice: attrd_trigger_update: >>> Sending flush op to all hosts for: last-failure-iscsi-engine-target >>> (1443364559) >>> Sep 27 15:35:59 glenrock attrd[3364]: notice: attrd_perform_update: >>> Sent update 207: last-failure-iscsi-engine-target=1443364559 >>> Sep 27 15:35:59 glenrock pengine[3365]: notice: unpack_config: On loss >>> of CCM Quorum: Ignore >>> Sep 27 15:35:59 glenrock pengine[3365]: warning: unpack_rsc_op_failure: >>> Processing failed op stop for iscsi-engine-target on glenrock: unknown >>> error (1) >>> Sep 27 15:35:59 glenrock pengine[3365]: warning: unpack_rsc_op_failure: >>> Processing failed op stop for iscsi-engine-target on glenrock: unknown >>> error (1) >>> Sep 27 15:35:59 glenrock pengine[3365]: warning: >>> common_apply_stickiness: Forcing iscsi-engine-target away from glenrock >>> after 1000000 failures (max=1000000) >>> Sep 27 15:35:59 glenrock pengine[3365]: notice: process_pe_message: >>> Calculated Transition 54: /var/lib/pacemaker/pengine/pe-input-537.bz2 >>> Sep 27 15:35:59 glenrock crmd[3366]: notice: run_graph: Transition 54 >>> (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, >>> Source=/var/lib/pacemaker/pengine/pe-input-537.bz2): Complete >>> Sep 27 15:35:59 glenrock crmd[3366]: notice: do_state_transition: >>> State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS >>> cause=C_FSA_INTERNAL origin=notify_crmd ] >>> >>> I've even deleted all resources for the engine stuff and started again, >>> but the same still happens! My oVirt engine is down all this time even >>> though the big "vms" lun has been absolutely fine. >> Something is going wrong, obviously. You will want to fix that, but I am >> not familiar with this RA so I can't be the one. I will say though that >> having configured, working stonith will help in cases like this because >> it would fence off the node that failed. (assuming you've configured it >> to fence on a resource failure like this). >> >> At the core, fencing is to put a node that is in an unknown state into a >> known state. This concept can be extended to the HA services... If the >> service goes into an unknown state that it can't recover from, it's >> valid to fence the node. > > Yes, I'm familiar with fencing as we already run RHEV. This is a POC of > Ovirt 3.5 for a small satellite office. The software always behaves as if it is critical. Besides, what good is a PoC that doesn't replicate in function the final product? >>> Also it's really odd that the iSCSITarget resource is not available is >>> Centos 6.x, I had to copy the resource script from an FTP site, ugh! >> So there is no guarantee of compatibility. Have you considered CentOS 7? >> With corosync v2 and no more cman plugin, it might help. > > That would be my next step. But ideally I'd like to get this working > first if I can, as we have some conservative elements here that think 7 > is "too new". I would agree with that assessment. I still use RHEL6 myself, though I also know I am one of the more conservative/paranoid cluster people. :) > Thanks > > Alex -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? _______________________________________________ Users mailing list: [email protected] http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
