Shared storage is not what triggers the need for fencing. Coordinating actions is what triggers the need. Specifically; If you can run resource on both/all nodes at the same time, you don't need HA. If you can't, you need fencing.
digimer On 2021-05-28 1:19 p.m., Eric Robinson wrote: > There is no fencing agent on this cluster and no shared storage. > > -Eric > > *From:* Strahil Nikolov <hunter86...@yahoo.com> > *Sent:* Friday, May 28, 2021 10:08 AM > *To:* Cluster Labs - All topics related to open-source clustering > welcomed <users@clusterlabs.org>; Eric Robinson <eric.robin...@psmnv.com> > *Subject:* Re: [ClusterLabs] Cluster Stopped, No Messages? > > what is your fencing agent ? > > Best Regards, > > Strahil Nikolov > > On Thu, May 27, 2021 at 20:52, Eric Robinson > > <eric.robin...@psmnv.com <mailto:eric.robin...@psmnv.com>> wrote: > > We found one of our cluster nodes down this morning. The server was > up but cluster services were not running. Upon examination of the > logs, we found that the cluster just stopped around 9:40:31 and then > I started it up manually (pcs cluster start) at 11:49:48. I can’t > imagine that Pacemaker just randomly terminates. Any thoughts why it > would behave this way? > > > > > > May 27 09:25:31 [92170] 001store01a pengine: notice: > process_pe_message: Calculated transition 91482, saving inputs in > /var/lib/pacemaker/pengine/pe-input-756.bz2 > > May 27 09:25:31 [92171] 001store01a crmd: info: > do_state_transition: State transition S_POLICY_ENGINE -> > S_TRANSITION_ENGINE | input=I_PE_SUCCESS cause=C_IPC_MESSAGE > origin=handle_response > > May 27 09:25:31 [92171] 001store01a crmd: info: > do_te_invoke: Processing graph 91482 > (ref=pe_calc-dc-1622121931-124396) derived from > /var/lib/pacemaker/pengine/pe-input-756.bz2 > > May 27 09:25:31 [92171] 001store01a crmd: notice: > run_graph: Transition 91482 (Complete=0, Pending=0, Fired=0, > Skipped=0, Incomplete=0, > Source=/var/lib/pacemaker/pengine/pe-input-756.bz2): Complete > > May 27 09:25:31 [92171] 001store01a crmd: info: > do_log: Input I_TE_SUCCESS received in state > S_TRANSITION_ENGINE from notify_crmd > > May 27 09:25:31 [92171] 001store01a crmd: notice: > do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE > | input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd > > May 27 09:40:31 [92171] 001store01a crmd: info: > crm_timer_popped: PEngine Recheck Timer (I_PE_CALC) just popped > (900000ms) > > May 27 09:40:31 [92171] 001store01a crmd: notice: > do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE | > input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped > > May 27 09:40:31 [92171] 001store01a crmd: info: > do_state_transition: Progressed to state S_POLICY_ENGINE after > C_TIMER_POPPED > > May 27 09:40:31 [92170] 001store01a pengine: info: > process_pe_message: Input has not changed since last time, not > saving to disk > > May 27 09:40:31 [92170] 001store01a pengine: info: > determine_online_status: Node 001store01a is online > > May 27 09:40:31 [92170] 001store01a pengine: info: > determine_op_status: Operation monitor found resource > p_pure-ftpd-itls active on 001store01a > > May 27 09:40:31 [92170] 001store01a pengine: warning: > unpack_rsc_op_failure: Processing failed op monitor for > p_vip_ftpclust01 on 001store01a: unknown error (1) > > May 27 09:40:31 [92170] 001store01a pengine: info: > determine_op_status: Operation monitor found resource > p_pure-ftpd-etls active on 001store01a > > May 27 09:40:31 [92170] 001store01a pengine: info: > unpack_node_loop: Node 1 is already processed > > May 27 09:40:31 [92170] 001store01a pengine: info: > unpack_node_loop: Node 1 is already processed > > May 27 09:40:31 [92170] 001store01a pengine: info: > common_print: p_vip_ftpclust01 > (ocf::heartbeat:IPaddr2): Started 001store01a > > May 27 09:40:31 [92170] 001store01a pengine: info: > common_print: p_replicator (systemd:pure-replicator): > Started 001store01a > > May 27 09:40:31 [92170] 001store01a pengine: info: > common_print: p_pure-ftpd-etls > (systemd:pure-ftpd-etls): Started 001store01a > > May 27 09:40:31 [92170] 001store01a pengine: info: > common_print: p_pure-ftpd-itls > (systemd:pure-ftpd-itls): Started 001store01a > > May 27 09:40:31 [92170] 001store01a pengine: info: > LogActions: Leave p_vip_ftpclust01 (Started 001store01a) > > May 27 09:40:31 [92170] 001store01a pengine: info: > LogActions: Leave p_replicator (Started 001store01a) > > May 27 09:40:31 [92170] 001store01a pengine: info: > LogActions: Leave p_pure-ftpd-etls (Started 001store01a) > > May 27 09:40:31 [92170] 001store01a pengine: info: > LogActions: Leave p_pure-ftpd-itls (Started 001store01a) > > May 27 09:40:31 [92170] 001store01a pengine: notice: > process_pe_message: Calculated transition 91483, saving inputs in > /var/lib/pacemaker/pengine/pe-input-756.bz2 > > May 27 09:40:31 [92171] 001store01a crmd: info: > do_state_transition: State transition S_POLICY_ENGINE -> > S_TRANSITION_ENGINE | input=I_PE_SUCCESS cause=C_IPC_MESSAGE > origin=handle_response > > May 27 09:40:31 [92171] 001store01a crmd: info: > do_te_invoke: Processing graph 91483 > (ref=pe_calc-dc-1622122831-124397) derived from > /var/lib/pacemaker/pengine/pe-input-756.bz2 > > May 27 09:40:31 [92171] 001store01a crmd: notice: > run_graph: Transition 91483 (Complete=0, Pending=0, Fired=0, > Skipped=0, Incomplete=0, > Source=/var/lib/pacemaker/pengine/pe-input-756.bz2): Complete > > May 27 09:40:31 [92171] 001store01a crmd: info: > do_log: Input I_TE_SUCCESS received in state > S_TRANSITION_ENGINE from notify_crmd > > May 27 09:40:31 [92171] 001store01a crmd: notice: > do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE > | input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd > > [10667] 001store01a.ccnva.local corosyncnotice [MAIN ] Corosync > Cluster Engine ('2.4.3'): started and ready to provide service. > > [10667] 001store01a.ccnva.local corosyncinfo [MAIN ] Corosync > built-in features: dbus systemd xmlconf qdevices qnetd snmp > libcgroup pie relro bindnow > > [10667] 001store01a.ccnva.local corosyncnotice [TOTEM ] > Initializing transport (UDP/IP Unicast). > > [10667] 001store01a.ccnva.local corosyncnotice [TOTEM ] > Initializing transmit/receive security (NSS) crypto: none hash: none > > [10667] 001store01a.ccnva.local corosyncnotice [TOTEM ] The network > interface [10.51.14.40] is now up. > > [10667] 001store01a.ccnva.local corosyncnotice [SERV ] Service > engine loaded: corosync configuration map access [0] > > [10667] 001store01a.ccnva.local corosyncinfo [QB ] server > name: cmap > > [10667] 001store01a.ccnva.local corosyncnotice [SERV ] Service > engine loaded: corosync configuration service [1] > > [10667] 001store01a.ccnva.local corosyncinfo [QB ] server > name: cfg > > [10667] 001store01a.ccnva.local corosyncnotice [SERV ] Service > engine loaded: corosync cluster closed process group service v1.01 [2] > > [10667] 001store01a.ccnva.local corosyncinfo [QB ] server > name: cpg > > [10667] 001store01a.ccnva.local corosyncnotice [SERV ] Service > engine loaded: corosync profile loading service [4] > > [10667] 001store01a.ccnva.local corosyncnotice [QUORUM] Using > quorum provider corosync_votequorum > > [10667] 001store01a.ccnva.local corosyncnotice [VOTEQ ] Waiting for > all cluster members. Current votes: 1 expected_votes: 2 > > [10667] 001store01a.ccnva.local corosyncnotice [SERV ] Service > engine loaded: corosync vote quorum service v1.0 [5] > > [10667] 001store01a.ccnva.local corosyncinfo [QB ] server > name: votequorum > > [10667] 001store01a.ccnva.local corosyncnotice [SERV ] Service > engine loaded: corosync cluster quorum service v0.1 [3] > > [10667] 001store01a.ccnva.local corosyncinfo [QB ] server > name: quorum > > [10667] 001store01a.ccnva.local corosyncnotice [TOTEM ] adding new > UDPU member {10.51.14.40} > > [10667] 001store01a.ccnva.local corosyncnotice [TOTEM ] adding new > UDPU member {10.51.14.41} > > [10667] 001store01a.ccnva.local corosyncnotice [TOTEM ] A new > membership (10.51.14.40:6412) was formed. Members joined: 1 > > [10667] 001store01a.ccnva.local corosyncnotice [VOTEQ ] Waiting for > all cluster members. Current votes: 1 expected_votes: 2 > > [10667] 001store01a.ccnva.local corosyncnotice [VOTEQ ] Waiting for > all cluster members. Current votes: 1 expected_votes: 2 > > [10667] 001store01a.ccnva.local corosyncnotice [VOTEQ ] Waiting for > all cluster members. Current votes: 1 expected_votes: 2 > > [10667] 001store01a.ccnva.local corosyncnotice [QUORUM] Members[1]: 1 > > [10667] 001store01a.ccnva.local corosyncnotice [MAIN ] Completed > service synchronization, ready to provide service. > > May 27 11:49:48 [10681] 001store01a.ccnva.local pacemakerd: > notice: main: Starting Pacemaker 1.1.18-11.el7_5.3 | > build=2b07d5c5a9 features: generated-manpages agent-manpages ncurses > libqb-logging libqb-ipc systemd nagios corosync-native atomic-attrd > acls > > May 27 11:49:48 [10681] 001store01a.ccnva.local pacemakerd: > info: main: Maximum core file size is: 18446744073709551615 > > May 27 11:49:48 [10681] 001store01a.ccnva.local pacemakerd: > info: qb_ipcs_us_publish: server name: pacemakerd > > May 27 11:49:48 [10681] 001store01a.ccnva.local pacemakerd: > info: crm_get_peer: Created entry > 05ad8b08-25a3-4a2d-84cb-1fc355fb697c/0x55d844a446b0 for node > 001store01a/1 (1 total) > > May 27 11:49:48 [10681] 001store01a.ccnva.local pacemakerd: > info: crm_get_peer: Node 1 is now known as 001store01a > > May 27 11:49:48 [10681] 001store01a.ccnva.local pacemakerd: > info: crm_get_peer: Node 1 has uuid 1 > > May 27 11:49:48 [10681] 001store01a.ccnva.local pacemakerd: > info: crm_update_peer_proc: cluster_connect_cpg: Node > 001store01a[1] - corosync-cpg is now online > > May 27 11:49:48 [10681] 001store01a.ccnva.local pacemakerd: > warning: cluster_connect_quorum: Quorum lost > > May 27 11:49:48 [10681] 001store01a.ccnva.local pacemakerd: > info: crm_get_peer: Created entry > 2f1f038e-9cc1-4a43-bab9-e7c91ca0bf3f/0x55d844a45ee0 for node > 001store01b/2 (2 total) > > May 27 11:49:48 [10681] 001store01a.ccnva.local pacemakerd: > info: crm_get_peer: Node 2 is now known as 001store01b > > May 27 11:49:48 [10681] 001store01a.ccnva.local pacemakerd: > info: crm_get_peer: Node 2 has uuid 2 > > May 27 11:49:48 [10681] 001store01a.ccnva.local pacemakerd: > info: start_child: Using uid=189 and group=189 for process cib > > May 27 11:49:48 [10681] 001store01a.ccnva.local pacemakerd: > info: start_child: Forked child 10682 for process cib > > May 27 11:49:48 [10681] 001store01a.ccnva.local pacemakerd: > info: start_child: Forked child 10683 for process stonith-ng > > May 27 11:49:48 [10681] 001store01a.ccnva.local pacemakerd: > info: start_child: Forked child 10684 for process lrmd > > May 27 11:49:48 [10681] 001store01a.ccnva.local pacemakerd: > info: start_child: Using uid=189 and group=189 for process attrd > > > > > > > > Disclaimer : This email and any files transmitted with it are > confidential and intended solely for intended recipients. If you are > not the named addressee you should not disseminate, distribute, copy > or alter this email. Any views or opinions presented in this email > are solely those of the author and might not represent those of > Physician Select Management. Warning: Although Physician Select > Management has taken reasonable precautions to ensure no viruses are > present in this email, the company cannot accept responsibility for > any loss or damage arising from the use of this email or attachments. > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > <https://lists.clusterlabs.org/mailman/listinfo/users> > > ClusterLabs home: https://www.clusterlabs.org/ > <https://www.clusterlabs.org/> > > Disclaimer : This email and any files transmitted with it are > confidential and intended solely for intended recipients. If you are not > the named addressee you should not disseminate, distribute, copy or > alter this email. Any views or opinions presented in this email are > solely those of the author and might not represent those of Physician > Select Management. Warning: Although Physician Select Management has > taken reasonable precautions to ensure no viruses are present in this > email, the company cannot accept responsibility for any loss or damage > arising from the use of this email or attachments. > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/