Your e-mail ended up in my spam mailbox.

Yes, you can do the following:
1. Increase token + consensus timeouts (check the man for the proper ratio)
2. Always set the node/cluster in maintenance and stop the cluster stack before 
patching.

Best Regards,
Strahil Nikolov     В понеделник, 28 октомври 2019 г., 19:26:39 ч. Гринуич+2, 
Casey Allen Shobe <casey.allen.sh...@icloud.com> написа:  
 
 I'm seeing a couple different situations where Pacemaker (using PostgreSQL 
Automated Failover resource) ends up thinking that the master node is not 
responding, and fences it when in fact the node was up and running fine.  We 
are using a VMWare ESXi infrastructure, which is fairly overcommitted 
especially in our lower environments, and many times this correlates exactly 
with when a VMWare vMotion happens, which seems to cause some delay in the 
response to one of Pacemaker's health checks.  In other cases, I have seen 
logind get restarted by an apt update, and that seems to trigger a failover 
even though PostgreSQL never went down.

Looking for potential solutions to these - is there a way to increase the 
tolerance on # of failures or timeout length to avoid unnecessary failovers?

Thank you for any advice!
-- 
Casey
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/
  
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to