But what if sbd fails to reset the timer multiple times (eg. because of excessive load, swap storm etc)?
If I remember, sbd has allocated memory with mlock and SCHED_RR in this way, when server is swapping, sbd doesn't stop. 2016-12-09 8:11 GMT+01:00 Ulrich Windl <ulrich.wi...@rz.uni-regensburg.de>: >>>> emmanuel segura <emi2f...@gmail.com> schrieb am 08.12.2016 um 14:37 in > Nachricht > <CAE7pJ3CSQyxQvBqLFvsFU=nlp95jwqdzvap_6clybo5rsdc...@mail.gmail.com>: >> the only thing that I can say is: sbd is a realtime process > > Hi! > > You are saying it's scheduled with policy SCHED_RR and priority 0? A > realtime-process is more than ist scheduling policy IMHO. > What are you really trying to say? > > Regards, > Ulrich > >> >> 2016-12-08 11:47 GMT+01:00 Jehan-Guillaume de Rorthais <j...@dalibo.com>: >>> Hello, >>> >>> While setting this various parameters, I couldn't find documentation and >>> details about them. Bellow some questions. >>> >>> Considering the watchdog module used on a server is set up with a 30s timer >>> (lets call it the wdt, the "watchdog timer"), how should >>> "SBD_WATCHDOG_TIMEOUT", "stonith-timeout" and "stonith-watchdog-timeout" be >> set? >>> >>> Here is my thinking so far: >>> >>> "SBD_WATCHDOG_TIMEOUT < wdt". The sbd daemon should reset the timer before >> the >>> wdt expire so the server stay alive. Online resources and default values are >>> usually "SBD_WATCHDOG_TIMEOUT=5s" and "wdt=30s". But what if sbd fails to >> reset >>> the timer multiple times (eg. because of excessive load, swap storm etc)? >> The >>> server will not reset before random*SBD_WATCHDOG_TIMEOUT or wdt, right? >>> >>> "stonith-watchdog-timeout > SBD_WATCHDOG_TIMEOUT". I'm not quite sure what >>> is >>> stonith-watchdog-timeout. Is it the maximum time to wait from stonithd after >> it >>> asked for a node fencing before it considers the watchdog was actually >>> triggered and the node reseted, even with no confirmation? I suppose >>> "stonith-watchdog-timeout" is mostly useful to stonithd, right? >>> >>> "stonith-watchdog-timeout < stonith-timeout". I understand the stonith >>> action >>> timeout should be at least greater than the wdt so stonithd will not raise a >>> timeout before the wdt had a chance to exprire and reset the node. Is it >> right? >>> >>> Any other comments? >>> >>> Regards, >>> -- >>> Jehan-Guillaume de Rorthais >>> Dalibo >>> >>> _______________________________________________ >>> Users mailing list: Users@clusterlabs.org >>> http://lists.clusterlabs.org/mailman/listinfo/users >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >> >> >> >> -- >> .~. >> /V\ >> // \\ >> /( )\ >> ^`~'^ >> >> _______________________________________________ >> Users mailing list: Users@clusterlabs.org >> http://lists.clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > > > > > _______________________________________________ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org -- .~. /V\ // \\ /( )\ ^`~'^ _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org