>>> Andrei Borzenkov <arvidj...@gmail.com> schrieb am 27.03.2021 um 06:37 in Nachricht <7c294034-56c3-baab-73c6-7909ab554...@gmail.com>: > On 26.03.2021 22:18, Reid Wahl wrote: >> On Fri, Mar 26, 2021 at 6:27 AM Andrei Borzenkov <arvidj...@gmail.com> >> wrote: >> >>> On Fri, Mar 26, 2021 at 10:17 AM Ulrich Windl >>> <ulrich.wi...@rz.uni‑regensburg.de> wrote: >>>> >>>>>>> Andrei Borzenkov <arvidj...@gmail.com> schrieb am 26.03.2021 um >>> 06:19 in >>>> Nachricht <534274b3‑a6de‑5fac‑0ae4‑d02c305f1...@gmail.com>: >>>>> On 25.03.2021 21:45, Reid Wahl wrote: >>>>>> FWIW we have this KB article (I seem to remember Strahil is a Red Hat >>>>>> customer): >>>>>> ‑ How do I configure SAP HANA Scale‑Up System Replication in a >>> Pacemaker >>>>>> cluster when the HANA filesystems are on NFS shares?( >>>>>> https://access.redhat.com/solutions/5156571) >>>>>> >>>>> >>>>> "How do I make the cluster resources recover when one node loses access >>>>> to the NFS server?" >>>>> >>>>> If node loses access to NFS server then monitor operations for >>> resources >>>>> that depend on NFS availability will fail or timeout and pacemaker will >>>>> recover (likely by rebooting this node). That's how similar >>>>> configurations have been handled for the past 20 years in other HA >>>>> managers. I am genuinely interested, have you encountered the case >>> where >>>>> it was not enough? >>>> >>>> That's a big problem with the SAP design (basically it's just too >>> complex). >>>> In the past I had written a kind of resource agent that worked without >>> that >>>> overly complex overhead, but since those days SAP has added much more >>>> complexity. >>>> If the NFS server is external, pacemaker could fence your nodes when the >>> NFS >>>> server is down as first the monitor operation will fail (hanging on >>> NFS), the >>>> the recover (stop/start) will fail (also hanging on NFS). >>> >>> And how exactly placing NFS resource under pacemaker control is going >>> to change it? >>> >> >> I noted earlier based on the old case notes: >> >> "Apparently there were situations in which the SAPHana resource wasn't >> failing over when connectivity was lost with the NFS share that contained >> the hdb* binaries and the HANA data. I don't remember the exact details >> (whether demotion was failing, or whether it wasn't even trying to demote >> on the primary and promote on the secondary, or what). Either way, I was >> surprised that this procedure was necessary, but it seemed to be." >> >> Strahil may be dealing with a similar situation, not sure. I get where >> you're coming from ‑‑ I too would expect the application that depends on >> NFS to simply fail when NFS connectivity is lost, which in turn leads to >> failover and recovery. For whatever reason, due to some weirdness of the >> SAPHana resource agent, that didn't happen. >> > > Yes. The only reason to use this workaround would be if resource agent > monitor still believes that application is up when required NFS is down. > Which is a bug in resource agent or possibly in application itself.
I think it's getting philosophical now: For example a web server using documents from an NFS server: Is the webserver down, when access to NFS hangs? Would restarting ("recover") the web server help in that situation? Maybe the OCF_CHECK_LEVEL could be used: High levels could query whether that resource is not only "running", but also that the resource is responding, etc. > > While using this workaround in this case is perfectly reasonable, none > of reasons listed in the message I was replying to are applicable. > > So far the only reason OP wanted to do it was some obscure race > condition on startup outside of pacemaker. In which case this workaround > simply delays NFS mount, sidestepping race. > > I also remember something about racing with dnsmasq, at which point I'd > say that making cluster depend on availability of DNS is e‑h‑h‑h unwise. > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/