On 2021-05-07 6:36 a.m., Kyle O'Donnell wrote: > Hi Everyone. > > We've setup fencing with our ilo/idrac interfaces and things generally > work well but during some of our failover scenario testing we ran into > issues when we "failed' the switches in which those ilo/idrac interfaces > were connected. The issue was that resources were migrated away from > any node with an offline fencing device. I can see how that is > desirable, but in our case this is essentially a single point of > failure. How are others managing this? > > In one of our sites we have "smart" APC power strips so we can setup > multiple fencing devices, but in another site we do not. I tried > increasing the timeout= value on the fencing devices but that did not > seem to work. > > Thanks, > Kyle
We use a pair of switched PDUs connected to a second switch (also, we do active/passive bonds for all links, each link in a bond going to different switches). This allows for either switch to be lost without interruption of network traffic and leaving one fence method available. Here's how we configure it to use IPMI (iDRAC, iRMC, iLO, etc) first, and to use a pair of PDUs as backup; ==== pcs stonith create ipmilan_node1 fence_ipmilan pcmk_host_list="an-a02n01" ipaddr="10.201.13.1" password="another secret p" username="admin" delay="15" op monitor interval="60" pcs stonith level add 1 an-a02n01 ipmilan_node1 pcs stonith create ipmilan_node2 fence_ipmilan pcmk_host_list="an-a02n02" ipaddr="10.201.13.2" password="another secret p" username="admin" op monitor interval="60" pcs stonith level add 1 an-a02n02 ipmilan_node2 pcs stonith create apc_snmp_node1_psu1 fence_apc_snmp pcmk_host_list="an-a02n01" pcmk_off_action="reboot" ip="10.201.2.3" port="3" power_wait="5" op monitor interval="60" pcs stonith create apc_snmp_node1_psu2 fence_apc_snmp pcmk_host_list="an-a02n01" pcmk_off_action="reboot" ip="10.201.2.4" port="3" power_wait="5" op monitor interval="60" pcs stonith level add 2 an-a02n01 apc_snmp_node1_psu1,apc_snmp_node1_psu2 pcs stonith create apc_snmp_node2_psu1 fence_apc_snmp pcmk_host_list="an-a02n02" pcmk_off_action="reboot" ip="10.201.2.3" port="4" power_wait="5" op monitor interval="60" pcs stonith create apc_snmp_node2_psu2 fence_apc_snmp pcmk_host_list="an-a02n02" pcmk_off_action="reboot" ip="10.201.2.4" port="4" power_wait="5" op monitor interval="60" pcs stonith level add 2 an-a02n02 apc_snmp_node2_psu1,apc_snmp_node2_psu2 pcs property set stonith-max-attempts=INFINITY pcs property set stonith-enabled=true ==== In the above example, node 1 is plugged into outlet 3 on both PDUs, and node 2 is on outlet 4, with PDU 1 at IP 10.201.2.3 and PDU 2 at IP 10.201.2.4. -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/