Hi Reid,
I will check it out in Monday, but I'm pretty sure I created an order set that 
first stops the topology and only then it stops the nfs-active.
Yet, I made the stupid decision to prevent ocf:heartbeat:Filesystem (and 
setting a huge timeout for the stop operation) from killing those 2 SAP 
processes which led to 'I can't umount, giving up'-like notification and of 
course fenced the entire cluster :D . 
Note taken, stonith has now different delays , and Filesystem can kill the 
processes.
As per the SAP note from Andrei, these could really be 'fast restart' 
mechanisms in HANA 2.0 and it looks safe to be killed (will check with SAP 
about that).

P.S: Is there a way to remove a whole set in pcs , cause it's really irritating 
when the stupid command wipes the resource from multiple order constraints?
Best Regards,Strahil Nikolov

 
 
  On Fri, Apr 2, 2021 at 23:44, Reid Wahl<nw...@redhat.com> wrote:   Hi, 
Strahil.
Based on the constraints documented in the article you're following (RH KB 
solution 5423971), I think I see what's happening.
The SAPHanaTopology resource requires the appropriate nfs-active attribute in 
order to run. That means that if the nfs-active attribute is set to false, the 
SAPHanaTopology resource must stop.
However, there's no rule saying SAPHanaTopology must finish stopping before the 
nfs-active attribute resource stops. In fact, it's quite the opposite: the 
SAPHanaTopology resource stops only after the nfs-active resource stops.
At the same time, the NFS resources are allowed to stop after the nfs-active 
attribute resource has stopped. So the NFS resources are stopping while the 
SAPHana* resources are likely still active.
Try something like this:    # pcs constraint order hana_nfs1_active-clone then 
SAPHanaTopology_<SID>_<instance_num>-clone kind=Optional
    # pcs constraint order hana_nfs2_active-clone then 
SAPHanaTopology_<SID>_<instance_num>-clone kind=Optional

This says "if both hana_nfs1_active and SAPHanaTopology are scheduled to start, 
then make hana_nfs1_active start first. If both are scheduled to stop, then 
make SAPHanaTopology stop first."
"kind=Optional" means there's no order dependency unless both resources are 
already going to be scheduled for the action. I'm using kind=Optional here even 
though kind=Mandatory (the default) would make sense, because IIRC there were 
some unexpected interactions with ordering constraints for clones, where events 
on one node had unwanted effects on other nodes.
I'm not able to test right now since setting up an environment for this even 
with dummy resources is non-trivial -- but you're welcome to try this both with 
and without kind=Optional if you'd like.
Please let us know how this goes.

On Fri, Apr 2, 2021 at 2:20 AM Strahil Nikolov <hunter86...@yahoo.com> wrote:

Hello All,
I am testing the newly built HANA (Scale-out) cluster and it seems that:Neither 
SAPHanaController, nor SAPHanaTopology are stopping the HANA when I put the 
nodes (same DC = same HANA) in standby. This of course leads to a situation 
where the NFS cannot be umounted and despite the stop timeout  - leads to 
fencing(on-fail=fence).
I thought that the Controller resource agent is stopping the HANA and the slave 
role should not be 'stopped' before that .
Maybe my expectations are wrong ?
Best Regards,Strahil Nikolov
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



-- 
Regards,

Reid Wahl, RHCA
Senior Software Maintenance Engineer, Red Hat
CEE - Platform Support Delivery - ClusterHA  
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to