>>> Andrei Borzenkov <arvidj...@gmail.com> schrieb am 26.03.2021 um 14:26 in Nachricht <caa91j0vskq9snuukl5mkvq0a7z_b9udvags-t9zukvzgdrr...@mail.gmail.com>: > On Fri, Mar 26, 2021 at 10:17 AM Ulrich Windl > <ulrich.wi...@rz.uni‑regensburg.de> wrote: >> >> >>> Andrei Borzenkov <arvidj...@gmail.com> schrieb am 26.03.2021 um 06:19 in >> Nachricht <534274b3‑a6de‑5fac‑0ae4‑d02c305f1...@gmail.com>: >> > On 25.03.2021 21:45, Reid Wahl wrote: >> >> FWIW we have this KB article (I seem to remember Strahil is a Red Hat >> >> customer): >> >> ‑ How do I configure SAP HANA Scale‑Up System Replication in a Pacemaker >> >> cluster when the HANA filesystems are on NFS shares?( >> >> https://access.redhat.com/solutions/5156571) >> >> >> > >> > "How do I make the cluster resources recover when one node loses access >> > to the NFS server?" >> > >> > If node loses access to NFS server then monitor operations for resources >> > that depend on NFS availability will fail or timeout and pacemaker will >> > recover (likely by rebooting this node). That's how similar >> > configurations have been handled for the past 20 years in other HA >> > managers. I am genuinely interested, have you encountered the case where >> > it was not enough? >> >> That's a big problem with the SAP design (basically it's just too complex). >> In the past I had written a kind of resource agent that worked without that >> overly complex overhead, but since those days SAP has added much more >> complexity. >> If the NFS server is external, pacemaker could fence your nodes when the NFS >> server is down as first the monitor operation will fail (hanging on NFS), > the >> the recover (stop/start) will fail (also hanging on NFS). > > And how exactly placing NFS resource under pacemaker control is going > to change it?
Actively maybe: Check reachability of the NFS server (local or remote); if it's not reachable, block all RA operations that would hang while NFS is down. (Basically a "freeze" isntead of a "recover" when NFS is down) > >> Even when fencing the >> node it would not help (resources cannot start) if the NFS server is still >> down. > > And how exactly placing NFS resource under pacemaker control is going > to change it? See above. > >> So you may end up with all your nodes being fenced and the fail counts >> disabling any automatic resource restart. >> > > And how exactly placing NFS resource under pacemaker control is going > to change it? Andrei, is there also another sentence you can say, or is that your favorite clicpboard message? ;-) Regards, Ulrich > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/