On 19/04/17 02:38 AM, Ulrich Windl wrote: >>>> Digimer <li...@alteeve.ca> schrieb am 18.04.2017 um 19:08 in Nachricht > <26e49390-b384-b46e-4965-eba5bfe59...@alteeve.ca>: >> On 18/04/17 11:07 AM, Lentes, Bernd wrote: >>> Hi, >>> >>> i'm currently establishing a two node cluster. Each node is a HP server > with >> an ILO card. >>> I can fence both of them, it's working fine. >>> But what is if the ILO does not work correctly ? Then fencing is not >> possible. >> >> Correct. If you only have iLO fencing, then the cluster would hang >> (failed fencing is *not* an indication of node death). >> >>> I also have a switched PDU from APC. Each server has two power supplies. >> Currently one is connected to the normal power equipment, the other to the >> UPS. >>> As a sort of redundancy, if the UPS does not work properly. >> >> That's a fine setup. >> >>> When i'd like to use the switched PDU as a fencing device i will loose the > >> redundancy of two independent power sources, because then i have to connect > >> both power supplies together to the UPS. >>> I wouldn't like to do that. >> >> Not if you have two switched PDUs. This is what we do in our Anvil! >> systems... One PDU feeds the first PSU in each node and the second PDU >> feeds the second PSUs. Ideally both PDUs are fed by UPSes, but that's >> not as important. One PDU on a UPS and one PDU directly from mains will >> work. >> >>> How important would you consider to have two independent fencing device for > >> each node ? I'd can't by another PDU, currently we are very poor. >> >> Depends entirely on your tolerance for interruption. *I* answer that >> with "extremely important". However, most clusters out there have only >> IPMI-based fencing, so they would obviously say "not so important". >> >>> Is there another way to create a second fencing device, independent from > the >> ILO card ? >>> >>> Thanks. >> >> Sure, SBD would work. I've never seen IPMI not have a watchdog timer >> (and iLO is IPMI++), as one example. It's slow, and needs shared >> storage, but a small box somewhere running a small tgtd or iscsid should >> do the trick (note that I have never used SBD myself...). > > Slow is relative: If it takes 3 seconds from issuing the reset command until > the node is dead, it's fast enough for most cases. Even a switched PDU has > some > delays: The command has to be processed, the relay may "stick" a short moment, > the power supply's capacitors have to discharge (if you have two power > supplys, > both need to)... And iLOs don't really like to be powered off. > > Ulrich
The way I understand SBD, and correct me if I am wrong, recovery won't begin until sometime after the watchdog timer kicks. If the watchdog timer is 60 seconds, then your cluster will hang for >60 seconds (plus fence delays, etc). IPMI and PDUs can confirm fence the peer if ~5 seconds (plus fence delays). -- Digimer Papers and Projects: https://alteeve.com/w/ "I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops." - Stephen Jay Gould _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org