On Fri, 27 Apr 2018, at 19:50, David Alan Hjelle wrote: > Does anyone have recommendations for setting `net_ticktime` to a lower > value, such as 10, instead of the default 60? In particular, this would > be for a 3-node cluster on bare metal connected to a single switch. > > Background & why I’m asking: > > As far as I can tell, if a node fails in certain ways, the rest of the > cluster can take up to 60 seconds (the default `net_ticktime`) to be > aware of it—which can pause certain operations (such as a database > creation) depending on the size of the cluster and quorum settings, etc. > (For instance, in a 3-node cluster with a single failure, reads and > writes continue to work, but creating a database waits until the > _membership is up-to-date.) > > It looks like I can lower the Erlang `net_ticktime` setting to make this > happen more quickly. The Erlang docs indicate that one should be > cautious in changing this parameter, as it could lead to Couch thinking > there were partitions when there were none—so I’m curious if anyone has > any practical experience? > > Thanks!
Hi David, I would avoid changing net_ticktime at all, with the same level of concern as a human would on hearing that a faster pacemaker might improve the reliability of their heart.... Are you seeing node down failures? Are you creating/removing DBs with such frequency that this is more than a theoretical constraint? When you change the tick time you also need to consider that if a 1/4 tick mark is missed, the runtime will start queueing inter-node traffic until it decides that the node is down or not. I'm interested to know if anybody else has ever tweaked this for couchdb.I know that its been fiddled with for riak, also around managing scheduler collapse, but broadly I'd only fiddle with this if I were seeing real world production problems. BTW https://www.rabbitmq.com/nettick.html has the nicest explanation, and you'd also want to consider http://erlang.org/doc/man/erl.html#+zdbbl as well, see http://erlang.org/doc/man/erlang.html#system_info_dist_buf_busy_limit A+ Dave
