[ClusterLabs] heartbeat/pacemaker: running without DC when having a split-brain like situation due to linux crash

Shyam Wed, 11 Nov 2015 20:36:05 -0800

Hi All,

We are using Pacemaker 1.1.6 on Ubuntu Precise. We have two nodes running
KVM as hypervisor & have one VM each on each node (vm-0/vm-1) that are
paired in a HA cluster using heartbeat/pacemaker. We have redudant network
links between the VM's & also have setup our own STONITH resource that
interacts with the underlying node hypervisor to reboot VM's.


We ran into an issue where the underlying node crashed badly, but the
networking & the VM were running ok for some more period of time, but with
some kind of slowness. So initially heartbeat on the healthy VM detected
that the peer VM died & started recovery action with a STONITH, but
suddenly when peer VM heartbeats started to arrive, it tried to abort
stonith recovery & also unset the current VM as DC. But the STONITH thread
was progressing independently & managed to reboot the peer VM which was
later recognized by further Link Down events. But this VM didnt takeover DC
after this & remained with a single VM without DC for a long period till we
identified/rebooted the crashed node.

Below are the full logs. Any pointers/insights on what fix should be done
within heartbeat/pacemaker is greatly appreciated. Thanks.


(from logs)
VM-1 recognizes that VM-0 went away with Link Down events & starts recovery
action
Nov  9 16:58:40 vm-1 heartbeat: [2378]: WARN: node vm-0: is dead
Nov  9 16:58:40 vm-1 heartbeat: [2378]: info: Link vm-0:eth0 dead.
Nov  9 16:58:40 vm-1 heartbeat: [2378]: info: Link vm-0:eth2 dead.
Nov  9 16:58:40 vm-1 crmd: [2478]: notice: crmd_ha_status_callback: Status
update: Node vm-0 now has status [dead]
Nov  9 16:58:40 vm-1 cib: [2474]: info: mem_handle_event: Got an event
OC_EV_MS_NOT_PRIMARY from ccm
Nov  9 16:58:40 vm-1 cib: [2474]: info: mem_handle_event: instance=54,
nodes=2, new=2, lost=0, n_idx=0, new_idx=0, old_idx=4
Nov  9 16:58:40 vm-1 cib: [2474]: info: cib_ccm_msg_callback: Processing
CCM event=NOT PRIMARY (id=54)
Nov  9 16:58:40 vm-1 crmd: [2478]: info: crm_update_peer_proc: vm-0.ais is
now offline
Nov  9 16:58:40 vm-1 crmd: [2478]: info: mem_handle_event: Got an event
OC_EV_MS_NOT_PRIMARY from ccm
Nov  9 16:58:40 vm-1 crmd: [2478]: info: mem_handle_event: instance=54,
nodes=2, new=2, lost=0, n_idx=0, new_idx=0, old_idx=4
Nov  9 16:58:40 vm-1 crmd: [2478]: info: crmd_ccm_msg_callback: Quorum lost
after event=NOT PRIMARY (id=54)
Nov  9 16:58:41 vm-1 ccm: [2473]: info: ccm_state_sent_memlistreq: directly
callccm_compute_and_send_final_memlist()
Nov  9 16:58:41 vm-1 ccm: [2473]: debug: quorum plugin: majority
Nov  9 16:58:41 vm-1 ccm: [2473]: debug: cluster:linux-ha, member_count=1,
member_quorum_votes=100
Nov  9 16:58:41 vm-1 ccm: [2473]: debug: total_node_count=2,
total_quorum_votes=200
Nov  9 16:58:41 vm-1 ccm: [2473]: debug: quorum plugin: twonodes
Nov  9 16:58:41 vm-1 ccm: [2473]: debug: cluster:linux-ha, member_count=1,
member_quorum_votes=100
Nov  9 16:58:41 vm-1 ccm: [2473]: debug: total_node_count=2,
total_quorum_votes=200
Nov  9 16:58:41 vm-1 ccm: [2473]: info: Break tie for 2 nodes cluster
Nov  9 16:58:41 vm-1 ccm: [2473]: debug: quorum plugin: majority
Nov  9 16:58:41 vm-1 cib: [2474]: info: mem_handle_event: Got an event
OC_EV_MS_INVALID from ccm
Nov  9 16:58:41 vm-1 crmd: [2478]: info: mem_handle_event: Got an event
OC_EV_MS_INVALID from ccm
Nov  9 16:58:41 vm-1 cib: [2474]: info: mem_handle_event: no mbr_track info
Nov  9 16:58:41 vm-1 crmd: [2478]: info: mem_handle_event: no mbr_track info
Nov  9 16:58:41 vm-1 cib: [2474]: info: mem_handle_event: Got an event
OC_EV_MS_NEW_MEMBERSHIP from ccm
Nov  9 16:58:41 vm-1 crmd: [2478]: info: mem_handle_event: Got an event
OC_EV_MS_NEW_MEMBERSHIP from ccm
Nov  9 16:58:41 vm-1 crmd: [2478]: info: mem_handle_event: instance=55,
nodes=1, new=0, lost=1, n_idx=0, new_idx=1, old_idx=3
Nov  9 16:58:41 vm-1 crmd: [2478]: info: crmd_ccm_msg_callback: Quorum
(re)attained after event=NEW MEMBERSHIP (id=55)
Nov  9 16:58:41 vm-1 crmd: [2478]: info: ccm_event_detail: NEW MEMBERSHIP:
trans=55, nodes=1, new=0, lost=1 n_idx=0, new_idx=1, old_idx=3
Nov  9 16:58:41 vm-1 cib: [2474]: info: mem_handle_event: instance=55,
nodes=1, new=0, lost=1, n_idx=0, new_idx=1, old_idx=3
Nov  9 16:58:41 vm-1 crmd: [2478]: info: ccm_event_detail: #011CURRENT:
vm-1 [nodeid=1, born=55]
Nov  9 16:58:41 vm-1 cib: [2474]: info: cib_ccm_msg_callback: Processing
CCM event=NEW MEMBERSHIP (id=55)
Nov  9 16:58:41 vm-1 crmd: [2478]: info: ccm_event_detail: #011LOST:
 vm-0 [nodeid=0, born=52]
Nov  9 16:58:41 vm-1 cib: [2474]: info: crm_update_peer: Node vm-0: id=0
state=lost (new) addr=(null) votes=-1 born=52 seen=54
proc=00000000000000000000000000000302
Nov  9 16:58:41 vm-1 ccm: [2473]: debug: cluster:linux-ha, member_count=1,
member_quorum_votes=100
Nov  9 16:58:41 vm-1 ccm: [2473]: debug: total_node_count=2,
total_quorum_votes=200
Nov  9 16:58:41 vm-1 ccm: [2473]: debug: quorum plugin: twonodes
Nov  9 16:58:41 vm-1 ccm: [2473]: debug: cluster:linux-ha, member_count=1,
member_quorum_votes=100
Nov  9 16:58:41 vm-1 ccm: [2473]: debug: total_node_count=2,
total_quorum_votes=200


as part of the recovery it unsets the peer VM (VM-0) as DC & does an
election with the single member that it has & moves itself as the DC with
starting pengine
Nov  9 16:58:41 vm-1 ccm: [2473]: info: Break tie for 2 nodes cluster
Nov  9 16:58:41 vm-1 crmd: [2478]: info: ais_status_callback: status: vm-0
is now lost (was member)
Nov  9 16:58:41 vm-1 crmd: [2478]: info: crm_update_peer: Node vm-0: id=0
state=lost (new) addr=(null) votes=-1 born=52 seen=54
proc=00000000000000000000000000000200
Nov  9 16:58:41 vm-1 crmd: [2478]: WARN: check_dead_member: Our DC node
(vm-0) left the cluster
Nov  9 16:58:41 vm-1 crmd: [2478]: info: do_state_transition: State
transition S_NOT_DC -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL
origin=check_dead_member ]
Nov  9 16:58:41 vm-1 crmd: [2478]: info: update_dc: Unset DC vm-0
Nov  9 16:58:41 vm-1 crmd: [2478]: info: do_state_transition: State
transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
cause=C_FSA_INTERNAL origin=do_election_check ]
Nov  9 16:58:41 vm-1 crmd: [2478]: info: do_te_control: Registering TE
UUID: 4717ec7c-684e-4718-998f-fb1938efed3b
Nov  9 16:58:41 vm-1 crmd: [2478]: info: set_graph_functions: Setting
custom graph functions
Nov  9 16:58:41 vm-1 crmd: [2478]: info: unpack_graph: Unpacked transition
-1: 0 actions in 0 synapses
Nov  9 16:58:41 vm-1 crmd: [2478]: info: start_subsystem: Starting
sub-system "pengine"
Nov  9 16:58:41 vm-1 pengine: [25018]: info: Invoked:
/usr/lib/heartbeat/pengine
Nov  9 16:58:45 vm-1 crmd: [2478]: info: do_dc_takeover: Taking over DC
status for this partition
Nov  9 16:58:45 vm-1 cib: [2474]: info: cib_process_readwrite: We are now
in R/W mode
Nov  9 16:58:45 vm-1 cib: [2474]: info: cib_process_request: Operation
complete: op cib_master for section 'all' (origin=local/crmd/28,
version=1.30.149): ok (rc=0)
Nov  9 16:58:45 vm-1 cib: [2474]: info: cib_process_request: Operation
complete: op cib_modify for section cib (origin=local/crmd/29,
version=1.30.150): ok (rc=0)
Nov  9 16:58:45 vm-1 cib: [2474]: info: cib_process_request: Operation
complete: op cib_modify for section crm_config (origin=local/crmd/31,
version=1.30.151): ok (rc=0)
Nov  9 16:58:45 vm-1 crmd: [2478]: info: join_make_offer: Making join
offers based on membership 55
Nov  9 16:58:45 vm-1 crmd: [2478]: info: do_dc_join_offer_all: join-1:
Waiting on 1 outstanding join acks
Nov  9 16:58:45 vm-1 cib: [2474]: info: cib_process_request: Operation
complete: op cib_modify for section crm_config (origin=local/crmd/33,
version=1.30.152): ok (rc=0)
Nov  9 16:58:45 vm-1 crmd: [2478]: info: config_query_callback: Shutdown
escalation occurs after: 1200000ms
Nov  9 16:58:45 vm-1 crmd: [2478]: info: config_query_callback: Checking
for expired actions every 900000ms
Nov  9 16:58:45 vm-1 crmd: [2478]: info: update_dc: Set DC to vm-1 (3.0.5)
Nov  9 16:58:46 vm-1 crmd: [2478]: info: do_state_transition: State
transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED
cause=C_FSA_INTERNAL origin=check_join_state ]
Nov  9 16:58:46 vm-1 crmd: [2478]: info: do_state_transition: All 1 cluster
nodes responded to the join offer.
Nov  9 16:58:46 vm-1 crmd: [2478]: info: do_dc_join_finalize: join-1:
Syncing the CIB from vm-1 to the rest of the cluster
Nov  9 16:58:46 vm-1 cib: [2474]: info: cib_process_request: Operation
complete: op cib_sync for section 'all' (origin=local/crmd/36,
version=1.30.152): ok (rc=0)
Nov  9 16:58:46 vm-1 cib: [2474]: info: cib_process_request: Operation
complete: op cib_modify for section nodes (origin=local/crmd/37,
version=1.30.153): ok (rc=0)
Nov  9 16:58:46 vm-1 lrmd: [2475]: info: stonith_api_device_metadata:
looking up external/vmstonith/heartbeat metadata
Nov  9 16:58:46 vm-1 vmstonith: Invoked as:
 /usr/lib/stonith/plugins/external/vmstonith getinfo-devdescr
Nov  9 16:58:46 vm-1 vmstonith: Invoked as:
 /usr/lib/stonith/plugins/external/vmstonith getinfo-devid
Nov  9 16:58:46 vm-1 vmstonith: Invoked as:
 /usr/lib/stonith/plugins/external/vmstonith getinfo-xml


while this is happening a late heartbeat arrived from peer VM which
triggers aborting the current shutdown action
Nov  9 16:58:47 vm-1 heartbeat: [2378]: CRIT: Cluster node vm-0 returning
after partition.
Nov  9 16:58:47 vm-1 crmd: [2478]: notice: crmd_ha_status_callback: Status
update: Node vm-0 now has status [active]
Nov  9 16:58:47 vm-1 heartbeat: [2378]: info: For information on cluster
partitions, See URL: http://linux-ha.org/wiki/Split_Brain
Nov  9 16:58:47 vm-1 crmd: [2478]: info: crm_update_peer_proc: vm-0.ais is
now online
Nov  9 16:58:47 vm-1 heartbeat: [2378]: WARN: Deadtime value may be too
small.
Nov  9 16:58:47 vm-1 heartbeat: [2378]: info: See FAQ for information on
tuning deadtime.
Nov  9 16:58:47 vm-1 heartbeat: [2378]: info: URL:
http://linux-ha.org/wiki/FAQ#Heavy_Load
Nov  9 16:58:47 vm-1 heartbeat: [2378]: info: Link vm-0:eth0 up.
Nov  9 16:58:47 vm-1 heartbeat: [2378]: WARN: Late heartbeat: Node vm-0:
interval 14000 ms
Nov  9 16:58:47 vm-1 heartbeat: [2378]: info: Status update for node vm-0:
status active
Nov  9 16:58:47 vm-1 crmd: [2478]: info: do_dc_join_ack: join-1: Updating
node state to member for vm-1
Nov  9 16:58:47 vm-1 heartbeat: [2378]: info: Link vm-0:eth2 up.
Nov  9 16:58:47 vm-1 cib: [2474]: info: cib_process_request: Operation
complete: op cib_delete for section //node_state[@uname='vm-1']/lrm
(origin=local/crmd/39, version=1.30.155): ok (rc=0)
Nov  9 16:58:47 vm-1 crmd: [2478]: info: erase_xpath_callback: Deletion of
"//node_state[@uname='vm-1']/lrm": ok (rc=0)
Nov  9 16:58:47 vm-1 crmd: [2478]: info: do_state_transition: State
transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED
cause=C_FSA_INTERNAL origin=check_join_state ]
Nov  9 16:58:47 vm-1 crmd: [2478]: info: populate_cib_nodes_ha: Requesting
the list of configured nodes
Nov  9 16:58:47 vm-1 crmd: [2478]: info: do_state_transition: All 1 cluster
nodes are eligible to run resources.
Nov  9 16:58:47 vm-1 crmd: [2478]: info: do_dc_join_final: Ensuring DC,
quorum and node attributes are up-to-date
Nov  9 16:58:47 vm-1 crmd: [2478]: info: crm_update_quorum: Updating quorum
status to true (call=43)
Nov  9 16:58:47 vm-1 attrd: [2477]: notice: attrd_local_callback: Sending
full refresh (origin=crmd)
Nov  9 16:58:47 vm-1 crmd: [2478]: info: abort_transition_graph:
do_te_invoke:167 - Triggered transition abort (complete=1) : Peer Cancelled
Nov  9 16:58:47 vm-1 crmd: [2478]: info: do_pe_invoke: Query 44: Requesting
the current CIB: S_POLICY_ENGINE
Nov  9 16:58:47 vm-1 crmd: [2478]: WARN: crmd_ha_msg_callback: Ignoring HA
message (op=vote) from vm-0: not in our membership list (size=1)
Nov  9 16:58:47 vm-1 cib: [2474]: info: cib_process_request: Operation
complete: op cib_modify for section nodes (origin=local/crmd/41,
version=1.30.157): ok (rc=0)
Nov  9 16:58:47 vm-1 crmd: [2478]: WARN: match_down_event: No match for
shutdown action on 00000281-0281-0000-2b91-000000000000
Nov  9 16:58:47 vm-1 crmd: [2478]: info: te_update_diff: Stonith/shutdown
of 00000281-0281-0000-2b91-000000000000 not matched
Nov  9 16:58:47 vm-1 crmd: [2478]: info: abort_transition_graph:
te_update_diff:215 - Triggered transition abort (complete=1,
tag=node_state, id=00000281-0281-0000-2b91-000000000000, magic=NA,
cib=1.30.158) : Node failure
Nov  9 16:58:47 vm-1 crmd: [2478]: info: do_pe_invoke: Query 45: Requesting
the current CIB: S_POLICY_ENGINE
Nov  9 16:58:47 vm-1 cib: [2474]: info: cib_process_request: Operation
complete: op cib_modify for section cib (origin=local/crmd/43,
version=1.30.159): ok (rc=0)
Nov  9 16:58:47 vm-1 crmd: [2478]: info: do_pe_invoke_callback: Invoking
the PE: query=45, ref=pe_calc-dc-1447106327-10, seq=55, quorate=1


But pengine is disconnected with this at this point & goes ahead asking for
a reboot of peer VM
Nov  9 16:58:47 vm-1 pengine: [25018]: notice: unpack_config: On loss of
CCM Quorum: Ignore
Nov  9 16:58:47 vm-1 attrd: [2477]: notice: attrd_trigger_update: Sending
flush op to all hosts for: probe_complete (true)
Nov  9 16:58:47 vm-1 pengine: [25018]: WARN: pe_fence_node: Node vm-0 will
be fenced because it is un-expectedly down
Nov  9 16:58:47 vm-1 pengine: [25018]: WARN: determine_online_status: Node
vm-0 is unclean
Nov  9 16:58:47 vm-1 pengine: [25018]: WARN: custom_action: Action
VM-FENCE:0_stop_0 on vm-0 is unrunnable (offline)
Nov  9 16:58:47 vm-1 pengine: [25018]: WARN: custom_action: Marking node
vm-0 unclean
Nov  9 16:58:47 vm-1 pengine: [25018]: WARN: stage6: Scheduling Node vm-0
for STONITH
Nov  9 16:58:47 vm-1 pengine: [25018]: notice: LogActions: Stop
 VM-FENCE:0#011(vm-0)
Nov  9 16:58:47 vm-1 pengine: [25018]: notice: LogActions: Leave
VM-FENCE:1#011(Started vm-1)
Nov  9 16:58:47 vm-1 crmd: [2478]: info: do_state_transition: State
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
cause=C_IPC_MESSAGE origin=handle_response ]
Nov  9 16:58:47 vm-1 crmd: [2478]: info: unpack_graph: Unpacked transition
0: 33 actions in 33 synapses
Nov  9 16:58:47 vm-1 crmd: [2478]: info: do_te_invoke: Processing graph 0
(ref=pe_calc-dc-1447106327-10) derived from /var/lib/pengine/pe-warn-1.bz2
Nov  9 16:58:47 vm-1 crmd: [2478]: info: te_pseudo_action: Pseudo action 14
fired and confirmed
Nov  9 16:58:47 vm-1 crmd: [2478]: info: te_pseudo_action: Pseudo action 9
fired and confirmed
Nov  9 16:58:47 vm-1 crmd: [2478]: info: te_pseudo_action: Pseudo action 15
fired and confirmed
Nov  9 16:58:47 vm-1 crmd: [2478]: info: te_fence_node: Executing reboot
fencing operation (51) on vm-0 (timeout=90000)
Nov  9 16:58:47 vm-1 stonith-ng: [2476]: info: initiate_remote_stonith_op:
Initiating remote operation reboot for vm-0:
c3bda7ff-b6ec-433e-86a1-e570dbaacc8b
Nov  9 16:58:47 vm-1 stonith-ng: [2476]: info: can_fence_host_with_device:
VM-FENCE:1 can fence vm-0: none
Nov  9 16:58:48 vm-1 ccm: [2473]: debug: quorum plugin: majority
Nov  9 16:58:48 vm-1 ccm: [2473]: debug: cluster:linux-ha, member_count=1,
member_quorum_votes=100
Nov  9 16:58:48 vm-1 ccm: [2473]: debug: total_node_count=2,
total_quorum_votes=200
Nov  9 16:58:48 vm-1 stonith-ng: [2476]: info:
process_remote_stonith_query: Ignoring reply from vm-0, hosts are not
permitted to commit suicide
Nov  9 16:58:48 vm-1 ccm: [2473]: debug: quorum plugin: twonodes
Nov  9 16:58:48 vm-1 ccm: [2473]: debug: cluster:linux-ha, member_count=1,
member_quorum_votes=100
Nov  9 16:58:48 vm-1 stonith-ng: [2476]: info: call_remote_stonith:
Requesting that vm-1 perform op reboot vm-0
Nov  9 16:58:48 vm-1 ccm: [2473]: debug: total_node_count=2,
total_quorum_votes=200
Nov  9 16:58:48 vm-1 crmd: [2478]: info: mem_handle_event: Got an event
OC_EV_MS_INVALID from ccm
Nov  9 16:58:48 vm-1 ccm: [2473]: info: Break tie for 2 nodes cluster
Nov  9 16:58:48 vm-1 cib: [2474]: info: mem_handle_event: Got an event
OC_EV_MS_INVALID from ccm
Nov  9 16:58:48 vm-1 crmd: [2478]: info: mem_handle_event: no mbr_track info
Nov  9 16:58:48 vm-1 cib: [2474]: info: mem_handle_event: no mbr_track info
Nov  9 16:58:48 vm-1 crmd: [2478]: info: mem_handle_event: Got an event
OC_EV_MS_NEW_MEMBERSHIP from ccm
Nov  9 16:58:48 vm-1 cib: [2474]: info: mem_handle_event: Got an event
OC_EV_MS_NEW_MEMBERSHIP from ccm
Nov  9 16:58:48 vm-1 crmd: [2478]: info: mem_handle_event: instance=56,
nodes=1, new=0, lost=0, n_idx=0, new_idx=1, old_idx=3
Nov  9 16:58:48 vm-1 cib: [2474]: info: mem_handle_event: instance=56,
nodes=1, new=0, lost=0, n_idx=0, new_idx=1, old_idx=3
Nov  9 16:58:48 vm-1 crmd: [2478]: info: crmd_ccm_msg_callback: Quorum
(re)attained after event=NEW MEMBERSHIP (id=56)
Nov  9 16:58:48 vm-1 cib: [2474]: info: cib_ccm_msg_callback: Processing
CCM event=NEW MEMBERSHIP (id=56)
Nov  9 16:58:48 vm-1 crmd: [2478]: info: ccm_event_detail: NEW MEMBERSHIP:
trans=56, nodes=1, new=0, lost=0 n_idx=0, new_idx=1, old_idx=3
Nov  9 16:58:48 vm-1 crmd: [2478]: info: ccm_event_detail: #011CURRENT:
vm-1 [nodeid=1, born=56]
Nov  9 16:58:48 vm-1 crmd: [2478]: info: populate_cib_nodes_ha: Requesting
the list of configured nodes
Nov  9 16:58:48 vm-1 stonith-ng: [2476]: info: stonith_fence: Exec
<stonith_command t="stonith-ng"
st_async_id="c3bda7ff-b6ec-433e-86a1-e570dbaacc8b" st_op="st_fence"
st_callid="0" st_callopt="0"
st_remote_op="c3bda7ff-b6ec-433e-86a1-e570dbaacc8b" st_target="vm-0"
st_device_action="reboot" st_timeout="81000" dest="vm-1" oseq="2"
from_id="stonith-ng" to_id="stonith-ng" client_gen="2" src="vm-1"
seq="1015" hg="5487db2a" ts="56411718" ld="0.02 0.04 0.05 4/369 25102"
ttl="3" auth="1 06b58c47479e1fb80d0979b4b8f367b60190789f" />
Nov  9 16:58:48 vm-1 stonith-ng: [2476]: info: can_fence_host_with_device:
VM-FENCE:1 can fence vm-0: none
Nov  9 16:58:48 vm-1 stonith-ng: [2476]: info: stonith_fence: Found 1
matching devices for 'vm-0'
Nov  9 16:58:48 vm-1 stonith-ng: [2476]: info: stonith_command: Processed
st_fence from vm-1: rc=-1
Nov  9 16:58:48 vm-1 stonith-ng: [2476]: info: make_args: reboot-ing node
'vm-0' as 'port=vm-0'
Nov  9 16:58:48 vm-1 vmstonith: Invoked as:
 /usr/lib/stonith/plugins/external/vmstonith getconfignames
Nov  9 16:58:48 vm-1 vmstonith: Invoked as:
 /usr/lib/stonith/plugins/external/vmstonith reset vm-0
Nov  9 16:58:48 vm-1 cib: [2474]: info: cib_process_request: Operation
complete: op cib_modify for section nodes (origin=local/crmd/47,
version=1.30.164): ok (rc=0)


A membership event arrives in
Nov  9 16:58:49 vm-1 ccm: [2473]: debug: quorum plugin: majority
Nov  9 16:58:49 vm-1 ccm: [2473]: debug: cluster:linux-ha, member_count=2,
member_quorum_votes=200
Nov  9 16:58:49 vm-1 ccm: [2473]: debug: total_node_count=2,
total_quorum_votes=200
Nov  9 16:58:49 vm-1 crmd: [2478]: info: mem_handle_event: Got an event
OC_EV_MS_INVALID from ccm
Nov  9 16:58:49 vm-1 crmd: [2478]: info: mem_handle_event: no mbr_track info
Nov  9 16:58:49 vm-1 cib: [2474]: info: mem_handle_event: Got an event
OC_EV_MS_INVALID from ccm
Nov  9 16:58:49 vm-1 crmd: [2478]: info: mem_handle_event: Got an event
OC_EV_MS_NEW_MEMBERSHIP from ccm
Nov  9 16:58:49 vm-1 cib: [2474]: info: mem_handle_event: no mbr_track info
Nov  9 16:58:49 vm-1 cib: [2474]: info: mem_handle_event: Got an event
OC_EV_MS_NEW_MEMBERSHIP from ccm
Nov  9 16:58:49 vm-1 cib: [2474]: info: mem_handle_event: instance=57,
nodes=2, new=1, lost=0, n_idx=0, new_idx=2, old_idx=4
Nov  9 16:58:49 vm-1 cib: [2474]: info: cib_ccm_msg_callback: Processing
CCM event=NEW MEMBERSHIP (id=57)
Nov  9 16:58:49 vm-1 cib: [2474]: info: crm_update_peer: Node vm-0: id=0
state=member (new) addr=(null) votes=-1 born=57 seen=57
proc=00000000000000000000000000000302
Nov  9 16:58:49 vm-1 crmd: [2478]: info: mem_handle_event: instance=57,
nodes=2, new=1, lost=0, n_idx=0, new_idx=2, old_idx=4
Nov  9 16:58:49 vm-1 crmd: [2478]: info: crmd_ccm_msg_callback: Quorum
(re)attained after event=NEW MEMBERSHIP (id=57)
Nov  9 16:58:49 vm-1 crmd: [2478]: info: ccm_event_detail: NEW MEMBERSHIP:
trans=57, nodes=2, new=1, lost=0 n_idx=0, new_idx=2, old_idx=4
Nov  9 16:58:49 vm-1 crmd: [2478]: info: ccm_event_detail: #011CURRENT:
vm-1 [nodeid=1, born=1]
Nov  9 16:58:49 vm-1 crmd: [2478]: info: ccm_event_detail: #011CURRENT:
vm-0 [nodeid=0, born=57]
Nov  9 16:58:49 vm-1 crmd: [2478]: info: ccm_event_detail: #011NEW:
vm-0 [nodeid=0, born=57]
Nov  9 16:58:49 vm-1 crmd: [2478]: info: ais_status_callback: status: vm-0
is now member (was lost)
Nov  9 16:58:49 vm-1 crmd: [2478]: WARN: match_down_event: No match for
shutdown action on vm-0
Nov  9 16:58:49 vm-1 crmd: [2478]: info: crm_update_peer: Node vm-0: id=0
state=member (new) addr=(null) votes=-1 born=57 seen=57
proc=00000000000000000000000000000202
Nov  9 16:58:49 vm-1 crmd: [2478]: info: populate_cib_nodes_ha: Requesting
the list of configured nodes
Nov  9 16:58:49 vm-1 cib: [2474]: info: cib_process_request: Operation
complete: op cib_delete for section //node_state[@uname='vm-0']/lrm
(origin=local/crmd/49, version=1.30.166): ok (rc=0)
Nov  9 16:58:49 vm-1 cib: [2474]: info: cib_process_request: Operation
complete: op cib_delete for section
//node_state[@uname='vm-0']/transient_attributes (origin=local/crmd/50,
version=1.30.167): ok (rc=0)
Nov  9 16:58:49 vm-1 crmd: [2478]: info: erase_xpath_callback: Deletion of
"//node_state[@uname='vm-0']/lrm": ok (rc=0)
Nov  9 16:58:49 vm-1 crmd: [2478]: info: abort_transition_graph:
te_update_diff:178 - Triggered transition abort (complete=0,
tag=transient_attributes, id=00000281-0281-0000-2b91-000000000000,
magic=NA, cib=1.30.167) : Transient attribute: removal
Nov  9 16:58:49 vm-1 crmd: [2478]: info: erase_xpath_callback: Deletion of
"//node_state[@uname='vm-0']/transient_attributes": ok (rc=0)
Nov  9 16:58:49 vm-1 cib: [2474]: info: cib_process_request: Operation
complete: op cib_modify for section nodes (origin=local/crmd/51,
version=1.30.168): ok (rc=0)


as part of this, this VM unsets itself as DC and considers the peer VM as
the DC. But it didnt update_dc() with peer VM VM-0) as the DC at this point
Nov  9 16:58:50 vm-1 crmd: [2478]: WARN: crmd_ha_msg_filter: Another DC
detected: vm-0 (op=noop)
Nov  9 16:58:50 vm-1 crmd: [2478]: info: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_ELECTION [ input=I_ELECTION
cause=C_FSA_INTERNAL origin=crmd_ha_msg_filter ]
Nov  9 16:58:50 vm-1 crmd: [2478]: info: update_dc: Unset DC vm-1
Nov  9 16:58:51 vm-1 crmd: [2478]: info: do_state_transition: State
transition S_ELECTION -> S_RELEASE_DC [ input=I_RELEASE_DC
cause=C_FSA_INTERNAL origin=do_election_count_vote ]
Nov  9 16:58:51 vm-1 crmd: [2478]: info: do_dc_release: DC role released
Nov  9 16:58:51 vm-1 crmd: [2478]: info: stop_subsystem: Sent -TERM to
pengine: [25018]
Nov  9 16:58:51 vm-1 crmd: [2478]: WARN: destroy_action: Cancelling timer
for action 3 (src=57)
Nov  9 16:58:51 vm-1 crmd: [2478]: info: do_te_control: Transitioner is now
inactive
Nov  9 16:58:51 vm-1 crmd: [2478]: info: do_state_transition: State
transition S_RELEASE_DC -> S_PENDING [ input=I_RELEASE_SUCCESS
cause=C_FSA_INTERNAL origin=do_dc_release ]
Nov  9 16:58:51 vm-1 crmd: [2478]: info: crmdManagedChildDied: Process
pengine:[25018] exited (signal=0, exitcode=0)
Nov  9 16:58:51 vm-1 crmd: [2478]: info: pe_msg_dispatch: Received HUP from
pengine:[25018]
Nov  9 16:58:51 vm-1 crmd: [2478]: info: pe_connection_destroy: Connection
to the Policy Engine released
Nov  9 16:58:51 vm-1 cib: [2474]: info: cib_process_readwrite: We are now
in R/O mode


The peer VM STONITH that was triggered manages to reboot the peer VM & at
this point the membership is changed
Nov  9 16:58:59 vm-1 stonith-ng: [2476]: info: log_operation: Operation
'reboot' [25103] (call 0 from (null)) for host 'vm-0' with device
'VM-FENCE:1' returned: 0
Nov  9 16:58:59 vm-1 stonith-ng: [2476]: info: log_operation: VM-FENCE:1:
Performing: stonith -t external/vmstonith -T reset vm-0
Nov  9 16:58:59 vm-1 stonith-ng: [2476]: info: log_operation: VM-FENCE:1:
success: vm-0 0
Nov  9 16:58:59 vm-1 stonith-ng: [2476]: info: stonith_device_execute:
Nothing to do for VM-FENCE:1
Nov  9 16:58:59 vm-1 stonith-ng: [2476]: info: process_remote_stonith_exec:
ExecResult <st-reply st_origin="stonith_construct_async_reply"
t="stonith-ng" st_op="st_notify"
st_remote_op="c3bda7ff-b6ec-433e-86a1-e570dbaacc8b" st_callid="0"
st_callopt="0" st_rc="0" st_output="Performing: stonith -t
external/vmstonith -T reset vm-0#012success: vm-0 0#012"
from_id="stonith-ng" to_id="stonith-ng" client_gen="2" src="vm-1"
seq="102d" hg="5487db2a" ts="56411723" ld="0.02 0.04 0.05 1/370 25223"
ttl="3" auth="1 e388a45fa82902e4d2cc13372bd016c90878a052" />
Nov  9 16:58:59 vm-1 stonith-ng: [2476]: info: remote_op_done: Notifing
clients of c3bda7ff-b6ec-433e-86a1-e570dbaacc8b (reboot of vm-0 from
8a97a051-0052-43fe-b06e-7dd766053c55 by vm-1): 2, rc=0
Nov  9 16:58:59 vm-1 stonith-ng: [2476]: info: stonith_notify_client:
Sending st_fence-notification to client
2478/886556f2-863a-46c8-9c59-33f84f5d8fcf
Nov  9 16:58:59 vm-1 crmd: [2478]: info: tengine_stonith_callback:
StonithOp <st-reply st_origin="stonith_construct_async_reply"
t="stonith-ng" st_op="reboot"
st_remote_op="c3bda7ff-b6ec-433e-86a1-e570dbaacc8b" st_callid="0"
st_callopt="0" st_rc="0" st_output="Performing: stonith -t
external/vmstonith -T reset vm-0#012success: vm-0 0#012"
from_id="stonith-ng" to_id="stonith-ng" client_gen="2" src="vm-1"
seq="102d" hg="5487db2a" ts="56411723" ld="0.02 0.04 0.05 1/370 25223"
ttl="3" auth="1 e388a45fa82902e4d2cc13372bd016c90878a052" state="2"
st_target="vm-0" />
Nov  9 16:58:59 vm-1 crmd: [2478]: info: tengine_stonith_callback: Stonith
operation 2/51:0:0:4717ec7c-684e-4718-998f-fb1938efed3b: OK (0)
Nov  9 16:58:59 vm-1 crmd: [2478]: info: tengine_stonith_notify: Peer vm-0
was terminated (reboot) by vm-1 for vm-1
(ref=c3bda7ff-b6ec-433e-86a1-e570dbaacc8b): OK
Nov  9 16:59:01 vm-1 heartbeat: [2378]: WARN: node vm-0: is dead
Nov  9 16:59:01 vm-1 heartbeat: [2378]: info: Link vm-0:eth0 dead.
Nov  9 16:59:01 vm-1 heartbeat: [2378]: info: Link vm-0:eth2 dead.
Nov  9 16:59:01 vm-1 ccm: [2473]: debug: quorum plugin: majority
Nov  9 16:59:01 vm-1 ccm: [2473]: debug: cluster:linux-ha, member_count=1,
member_quorum_votes=100
Nov  9 16:59:01 vm-1 ccm: [2473]: debug: total_node_count=2,
total_quorum_votes=200
Nov  9 16:59:01 vm-1 ccm: [2473]: debug: quorum plugin: twonodes
Nov  9 16:59:01 vm-1 ccm: [2473]: debug: cluster:linux-ha, member_count=1,
member_quorum_votes=100
Nov  9 16:59:01 vm-1 ccm: [2473]: debug: total_node_count=2,
total_quorum_votes=200
Nov  9 16:59:01 vm-1 ccm: [2473]: info: Break tie for 2 nodes cluster
Nov  9 16:59:01 vm-1 crmd: [2478]: notice: crmd_ha_status_callback: Status
update: Node vm-0 now has status [dead]
Nov  9 16:59:01 vm-1 cib: [2474]: info: mem_handle_event: Got an event
OC_EV_MS_INVALID from ccm
Nov  9 16:59:01 vm-1 cib: [2474]: info: mem_handle_event: no mbr_track info
Nov  9 16:59:01 vm-1 cib: [2474]: info: mem_handle_event: Got an event
OC_EV_MS_NEW_MEMBERSHIP from ccm
Nov  9 16:59:01 vm-1 cib: [2474]: info: mem_handle_event: instance=58,
nodes=1, new=0, lost=1, n_idx=0, new_idx=1, old_idx=3
Nov  9 16:59:01 vm-1 cib: [2474]: info: cib_ccm_msg_callback: Processing
CCM event=NEW MEMBERSHIP (id=58)
Nov  9 16:59:01 vm-1 cib: [2474]: info: crm_update_peer: Node vm-0: id=0
state=lost (new) addr=(null) votes=-1 born=57 seen=57
proc=00000000000000000000000000000302
Nov  9 16:59:01 vm-1 crmd: [2478]: info: crm_update_peer_proc: vm-0.ais is
now offline
Nov  9 16:59:01 vm-1 crmd: [2478]: info: mem_handle_event: Got an event
OC_EV_MS_INVALID from ccm
Nov  9 16:59:01 vm-1 crmd: [2478]: info: mem_handle_event: no mbr_track info
Nov  9 16:59:01 vm-1 crmd: [2478]: info: mem_handle_event: Got an event
OC_EV_MS_NEW_MEMBERSHIP from ccm
Nov  9 16:59:01 vm-1 crmd: [2478]: info: mem_handle_event: instance=58,
nodes=1, new=0, lost=1, n_idx=0, new_idx=1, old_idx=3
Nov  9 16:59:01 vm-1 crmd: [2478]: info: crmd_ccm_msg_callback: Quorum
(re)attained after event=NEW MEMBERSHIP (id=58)
Nov  9 16:59:01 vm-1 crmd: [2478]: info: ccm_event_detail: NEW MEMBERSHIP:
trans=58, nodes=1, new=0, lost=1 n_idx=0, new_idx=1, old_idx=3
Nov  9 16:59:01 vm-1 crmd: [2478]: info: ccm_event_detail: #011CURRENT:
vm-1 [nodeid=1, born=58]
Nov  9 16:59:01 vm-1 crmd: [2478]: info: ccm_event_detail: #011LOST:
 vm-0 [nodeid=0, born=57]
Nov  9 16:59:01 vm-1 crmd: [2478]: info: ais_status_callback: status: vm-0
is now lost (was member)
Nov  9 16:59:01 vm-1 crmd: [2478]: info: crm_update_peer: Node vm-0: id=0
state=lost (new) addr=(null) votes=-1 born=57 seen=57
proc=00000000000000000000000000000200
Nov  9 17:00:08 vm-1 stonith-ng: [2476]: info: stonith_command: Processed
st_execute from lrmd: rc=-1
Nov  9 17:00:08 vm-1 vmstonith: Invoked as:
 /usr/lib/stonith/plugins/external/vmstonith getconfignames
Nov  9 17:00:08 vm-1 vmstonith: Invoked as:
 /usr/lib/stonith/plugins/external/vmstonith status
Nov  9 17:00:09 vm-1 stonith: [26002]: info: external/vmstonith device OK.
Nov  9 17:00:09 vm-1 stonith-ng: [2476]: info: log_operation: VM-FENCE:1:
Performing: stonith -t external/vmstonith -S
Nov  9 17:00:09 vm-1 stonith-ng: [2476]: info: log_operation: VM-FENCE:1:
success:  0
Nov  9 17:00:09 vm-1 stonith-ng: [2476]: info: stonith_device_execute:
Nothing to do for VM-FENCE:1


but by this time vm-0 is not set as the DC. So DC takeover doesnt happen &
VM-1 still remains as a member not doing any action.

When we managed to reboot the underlying node from crash, heartbeat on the
peer VM was restarted
Nov  9 17:29:02 vm-1 heartbeat: [2378]: info: Heartbeat restart on node vm-0
Nov  9 17:29:02 vm-1 heartbeat: [2378]: info: Link vm-0:eth0 up.
Nov  9 17:29:02 vm-1 heartbeat: [2378]: info: Status update for node vm-0:
status init
Nov  9 17:29:02 vm-1 heartbeat: [2378]: info: Link vm-0:eth2 up.
Nov  9 17:29:02 vm-1 heartbeat: [2378]: info: Status update for node vm-0:
status up
Nov  9 17:29:02 vm-1 crmd: [2478]: notice: crmd_ha_status_callback: Status
update: Node vm-0 now has status [init]
Nov  9 17:29:02 vm-1 crmd: [2478]: info: crm_update_peer_proc: vm-0.ais is
now online
Nov  9 17:29:02 vm-1 crmd: [2478]: notice: crmd_ha_status_callback: Status
update: Node vm-0 now has status [up]
Nov  9 17:29:03 vm-1 heartbeat: [2378]: debug: get_delnodelist: delnodelist=
Nov  9 17:29:03 vm-1 heartbeat: [2378]: info: all clients are now paused
Nov  9 17:29:03 vm-1 heartbeat: [2378]: info: Status update for node vm-0:
status active
Nov  9 17:29:03 vm-1 crmd: [2478]: notice: crmd_ha_status_callback: Status
update: Node vm-0 now has status [active]
Nov  9 17:29:04 vm-1 cib: [2474]: info: cib_client_status_callback: Status
update: Client vm-0/cib now has status [join]
Nov  9 17:29:05 vm-1 heartbeat: [2378]: info: all clients are now resumed
Nov  9 17:29:11 vm-1 heartbeat: [2378]: WARN: 1 lost packet(s) for [vm-0]
[19:21]
Nov  9 17:29:11 vm-1 heartbeat: [2378]: info: No pkts missing from vm-0!
Nov  9 17:29:11 vm-1 ccm: [2473]: debug: quorum plugin: majority
Nov  9 17:29:11 vm-1 crmd: [2478]: notice: crmd_client_status_callback:
Status update: Client vm-0/crmd now has status [online] (DC=false)
Nov  9 17:29:11 vm-1 ccm: [2473]: debug: cluster:linux-ha, member_count=2,
member_quorum_votes=200
Nov  9 17:29:11 vm-1 ccm: [2473]: debug: total_node_count=2,
total_quorum_votes=200
Nov  9 17:29:11 vm-1 cib: [2474]: info: mem_handle_event: Got an event
OC_EV_MS_INVALID from ccm
Nov  9 17:29:11 vm-1 cib: [2474]: info: mem_handle_event: no mbr_track info
Nov  9 17:29:11 vm-1 cib: [2474]: info: mem_handle_event: Got an event
OC_EV_MS_NEW_MEMBERSHIP from ccm
Nov  9 17:29:11 vm-1 cib: [2474]: info: mem_handle_event: instance=59,
nodes=2, new=1, lost=0, n_idx=0, new_idx=2, old_idx=4
Nov  9 17:29:11 vm-1 cib: [2474]: info: cib_ccm_msg_callback: Processing
CCM event=NEW MEMBERSHIP (id=59)
Nov  9 17:29:11 vm-1 cib: [2474]: info: crm_update_peer: Node vm-0: id=0
state=member (new) addr=(null) votes=-1 born=59 seen=59
proc=00000000000000000000000000000302
Nov  9 17:29:11 vm-1 crmd: [2478]: info: mem_handle_event: Got an event
OC_EV_MS_INVALID from ccm
Nov  9 17:29:11 vm-1 crmd: [2478]: info: mem_handle_event: no mbr_track info
Nov  9 17:29:11 vm-1 crmd: [2478]: info: mem_handle_event: Got an event
OC_EV_MS_NEW_MEMBERSHIP from ccm
Nov  9 17:29:11 vm-1 crmd: [2478]: info: mem_handle_event: instance=59,
nodes=2, new=1, lost=0, n_idx=0, new_idx=2, old_idx=4
Nov  9 17:29:11 vm-1 crmd: [2478]: info: crmd_ccm_msg_callback: Quorum
(re)attained after event=NEW MEMBERSHIP (id=59)
Nov  9 17:29:11 vm-1 crmd: [2478]: info: ccm_event_detail: NEW MEMBERSHIP:
trans=59, nodes=2, new=1, lost=0 n_idx=0, new_idx=2, old_idx=4
Nov  9 17:29:11 vm-1 crmd: [2478]: info: ccm_event_detail: #011CURRENT:
vm-1 [nodeid=1, born=1]
Nov  9 17:29:11 vm-1 crmd: [2478]: info: ccm_event_detail: #011CURRENT:
vm-0 [nodeid=0, born=59]
Nov  9 17:29:11 vm-1 crmd: [2478]: info: ccm_event_detail: #011NEW:
vm-0 [nodeid=0, born=59]
Nov  9 17:29:11 vm-1 crmd: [2478]: info: ais_status_callback: status: vm-0
is now member (was lost)
Nov  9 17:29:11 vm-1 crmd: [2478]: info: crm_update_peer: Node vm-0: id=0
state=member (new) addr=(null) votes=-1 born=59 seen=59
proc=00000000000000000000000000000202
Nov  9 17:29:12 vm-1 heartbeat: [2378]: WARN: 1 lost packet(s) for [vm-0]
[25:27]
Nov  9 17:29:12 vm-1 heartbeat: [2378]: info: No pkts missing from vm-0!


Only after this point an election was done & this VM (VM-1) took over as
the DC & started resources
Nov  9 17:29:25 vm-1 crmd: [2478]: info: do_election_count_vote: Election 2
(owner: 00000281-0281-0000-2b91-000000000000) pass: vote from vm-0 (Uptime)
Nov  9 17:29:25 vm-1 crmd: [2478]: info: do_state_transition: State
transition S_PENDING -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL
origin=do_election_count_vote ]
Nov  9 17:29:26 vm-1 crmd: [2478]: info: do_state_transition: State
transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
cause=C_FSA_INTERNAL origin=do_election_check ]
Nov  9 17:29:26 vm-1 crmd: [2478]: info: do_te_control: Registering TE
UUID: cae16cf4-bcc7-4027-8a84-33537c7696a3
Nov  9 17:29:26 vm-1 crmd: [2478]: info: set_graph_functions: Setting
custom graph functions
Nov  9 17:29:26 vm-1 crmd: [2478]: info: unpack_graph: Unpacked transition
-1: 0 actions in 0 synapses
Nov  9 17:29:26 vm-1 crmd: [2478]: info: start_subsystem: Starting
sub-system "pengine"
Nov  9 17:29:26 vm-1 pengine: [12152]: info: Invoked:
/usr/lib/heartbeat/pengine
Nov  9 17:29:30 vm-1 crmd: [2478]: info: do_dc_takeover: Taking over DC
status for this partition
Nov  9 17:29:30 vm-1 cib: [2474]: info: cib_process_readwrite: We are now
in R/W mode
Nov  9 17:29:30 vm-1 cib: [2474]: info: cib_process_request: Operation
complete: op cib_master for section 'all' (origin=local/crmd/55,
version=1.30.172): ok (rc=0)
Nov  9 17:29:30 vm-1 cib: [2474]: info: cib_process_request: Operation
complete: op cib_modify for section cib (origin=local/crmd/56,
version=1.30.173): ok (rc=0)
Nov  9 17:29:30 vm-1 cib: [2474]: info: cib_process_request: Operation
complete: op cib_modify for section crm_config (origin=local/crmd/58,
version=1.30.174): ok (rc=0)
Nov  9 17:29:30 vm-1 crmd: [2478]: info: join_make_offer: Making join
offers based on membership 59
Nov  9 17:29:30 vm-1 crmd: [2478]: info: do_dc_join_offer_all: join-2:
Waiting on 2 outstanding join acks
Nov  9 17:29:30 vm-1 cib: [2474]: info: cib_process_request: Operation
complete: op cib_modify for section crm_config (origin=local/crmd/60,
version=1.30.175): ok (rc=0)
Nov  9 17:29:30 vm-1 crmd: [2478]: info: config_query_callback: Shutdown
escalation occurs after: 1200000ms
Nov  9 17:29:30 vm-1 crmd: [2478]: info: config_query_callback: Checking
for expired actions every 900000ms
Nov  9 17:29:30 vm-1 crmd: [2478]: info: update_dc: Set DC to vm-1 (3.0.5)
Nov  9 17:29:31 vm-1 cib: [2474]: info: cib_process_request: Operation
complete: op cib_sync_one for section 'all' (origin=vm-0/vm-0/(null),
version=1.30.175): ok (rc=0)
Nov  9 17:29:31 vm-1 crmd: [2478]: info: do_state_transition: State
transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED
cause=C_FSA_INTERNAL origin=check_join_state ]
Nov  9 17:29:31 vm-1 crmd: [2478]: info: do_state_transition: All 2 cluster
nodes responded to the join offer.
Nov  9 17:29:31 vm-1 crmd: [2478]: info: do_dc_join_finalize: join-2:
Syncing the CIB from vm-1 to the rest of the cluster
Nov  9 17:29:31 vm-1 cib: [2474]: info: cib_process_request: Operation
complete: op cib_sync for section 'all' (origin=local/crmd/63,
version=1.30.175): ok (rc=0)
Nov  9 17:29:31 vm-1 cib: [2474]: info: cib_process_request: Operation
complete: op cib_modify for section nodes (origin=local/crmd/64,
version=1.30.176): ok (rc=0)
Nov  9 17:29:31 vm-1 cib: [2474]: info: cib_process_request: Operation
complete: op cib_modify for section nodes (origin=local/crmd/65,
version=1.30.177): ok (rc=0)
Nov  9 17:29:32 vm-1 lrmd: [2475]: info: stonith_api_device_metadata:
looking up external/vmstonith/heartbeat metadata
Nov  9 17:29:32 vm-1 vmstonith: Invoked as:
 /usr/lib/stonith/plugins/external/vmstonith getinfo-devdescr
Nov  9 17:29:32 vm-1 vmstonith: Invoked as:
 /usr/lib/stonith/plugins/external/vmstonith getinfo-devid
Nov  9 17:29:32 vm-1 vmstonith: Invoked as:
 /usr/lib/stonith/plugins/external/vmstonith getinfo-xml
Nov  9 17:29:32 vm-1 crmd: [2478]: info: do_dc_join_ack: join-2: Updating
node state to member for vm-0
Nov  9 17:29:32 vm-1 crmd: [2478]: info: do_dc_join_ack: join-2: Updating
node state to member for vm-1
Nov  9 17:29:32 vm-1 cib: [2474]: info: cib_process_request: Operation
complete: op cib_delete for section
//node_state[@uname='vm-0']/transient_attributes (origin=vm-0/crmd/8,
version=1.30.178): ok (rc=0)
Nov  9 17:29:32 vm-1 cib: [2474]: info: cib_process_request: Operation
complete: op cib_delete for section //node_state[@uname='vm-0']/lrm
(origin=local/crmd/66, version=1.30.179): ok (rc=0)
Nov  9 17:29:32 vm-1 cib: [2474]: info: cib_process_request: Operation
complete: op cib_delete for section //node_state[@uname='vm-1']/lrm
(origin=local/crmd/68, version=1.30.181): ok (rc=0)
Nov  9 17:29:32 vm-1 crmd: [2478]: info: erase_xpath_callback: Deletion of
"//node_state[@uname='vm-0']/lrm": ok (rc=0)
Nov  9 17:29:32 vm-1 crmd: [2478]: info: erase_xpath_callback: Deletion of
"//node_state[@uname='vm-1']/lrm": ok (rc=0)
Nov  9 17:29:32 vm-1 crmd: [2478]: info: do_state_transition: State
transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED
cause=C_FSA_INTERNAL origin=check_join_state ]
Nov  9 17:29:32 vm-1 crmd: [2478]: info: populate_cib_nodes_ha: Requesting
the list of configured nodes
Nov  9 17:29:32 vm-1 crmd: [2478]: info: do_state_transition: All 2 cluster
nodes are eligible to run resources.
Nov  9 17:29:32 vm-1 crmd: [2478]: info: do_dc_join_final: Ensuring DC,
quorum and node attributes are up-to-date
Nov  9 17:29:32 vm-1 crmd: [2478]: info: crm_update_quorum: Updating quorum
status to true (call=72)
Nov  9 17:29:32 vm-1 attrd: [2477]: notice: attrd_local_callback: Sending
full refresh (origin=crmd)
Nov  9 17:29:32 vm-1 crmd: [2478]: info: do_pe_invoke: Query 73: Requesting
the current CIB: S_POLICY_ENGINE
Nov  9 17:29:32 vm-1 crmd: [2478]: info: te_update_diff: Detected LRM
refresh - 11 resources updated: Skipping all resource events
Nov  9 17:29:32 vm-1 crmd: [2478]: info: abort_transition_graph:
te_update_diff:251 - Triggered transition abort (complete=1, tag=diff,
id=(null), magic=NA, cib=1.30.182) : LRM Refresh
Nov  9 17:29:32 vm-1 crmd: [2478]: info: do_pe_invoke: Query 74: Requesting
the current CIB: S_POLICY_ENGINE
Nov  9 17:29:32 vm-1 cib: [2474]: info: cib_process_request: Operation
complete: op cib_modify for section nodes (origin=local/crmd/70,
version=1.30.183): ok (rc=0)
Nov  9 17:29:32 vm-1 cib: [2474]: info: cib_process_request: Operation
complete: op cib_modify for section cib (origin=local/crmd/72,
version=1.30.185): ok (rc=0)
Nov  9 17:29:32 vm-1 crmd: [2478]: info: do_pe_invoke_callback: Invoking
the PE: query=74, ref=pe_calc-dc-1447108172-26, seq=59, quorate=1
Nov  9 17:29:32 vm-1 attrd: [2477]: notice: attrd_trigger_update: Sending
flush op to all hosts for: probe_complete (true)

Thanks.

--Shyam

_______________________________________________
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] heartbeat/pacemaker: running without DC when having a split-brain like situation due to linux crash

Reply via email to