Hi Jan, On Tue, 3 Oct 2017, Jan Friesse wrote:
> > I hope this makes sense! :) > > I would still have some questions :) but that is really not related to > the problem you have. Questions are welcome! I am new to this stack, so there is certainly room for learning and for improvement. > My personal favorite is consensus timeout. Because you've set (and I > must say according to doc correctly) consensus timeout to 3600 (= 1.2 * > token). Problem is, that result token timeout is not 3000, but with 5 > nodes it is actually 3000 (base token) + (no_nodes - 2) * 650 ms = 4950 > (as you can check by observing runtime.config.totem.token key). So it > may make sense to set consensus timeout to ~6000. Could you clarify the formula for me? I don't see how "- 2" and "650" map to this configuration. And I suppose that on our bigger system (20+5 servers) we need to greatly increase the consensus timeout. Overall, tuning the timeouts seems related to be Black Magic. ;) I liked the idea suggested in an old thread that there would be a spreadsheet (or even just plain formulas) exposing the relation between the various knobs. One thing I wonder is: would it make sense to annotate the state machine diagram in the Totem paper (page 15 of http://www.cs.jhu.edu/~yairamir/tocs.ps.gz) with those tunables? Assuming the paper still reflects the behavior of the current code. > This doesn't change the fact that "bug" is reproducible even with > "correct" consensus, so I will continue working on this issue. Great! Thanks for taking the time to investigate. Cheers, JM -- saff...@gmail.com _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org