On 01/09/17 09:48 +0300, Klechomir wrote: > I have cases, when for an unknown reason a single monitoring request > never returns result. > So having bigger timeouts doesn't resolve this problem.
If I get you right, the pain point here is a command called by the resource agents during monitor operation, while this command under some circumstances _never_ terminates (for dead waiting, infinite loop, or whatever other reason) or possibly terminates based on external/asynchronous triggers (e.g. network connection gets reestablished). Stating obvious, the solution should be: - work towards fixing such particular command if blocking is an unexpected behaviour (clarify this with upstream if needed) - find more reliable way for the agent to monitor the resource For the planned soft-recovery options Ken talked about, I am not sure if it would be trivially possible to differentiate exceeded monitor timeout from a plain monitor failure. -- Jan (Poki)
pgpSKNHqKHNlw.pgp
Description: PGP signature
_______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org