Mayuresh, Thanks for the details. I'll need to do some more tests to get back with specific numbers re delay and check for timeouts.
For now (pre KIP-291 being implemented), the only parameters that will tune leader election will be the zookeeper timeout and increasing the number of network threads (To try and work through the queued requests faster)? Thanks, Mark On Thu, 6 Dec 2018 at 23:43 Mayuresh Gharat <[email protected]> wrote: > Hi Mark, > > The leader election of a new topic partition happens once the controller > detects that the Leader has crashed. > This happens asynchronously via a zookeeper listener. Once a zookeeper > listener is fired, the corresponding object indicating the event happened > is put in to a controller queue. > The controller has a single thread that pulls data out of this queue and > handles each event one after another. > I can't remember of a config to tune this, on top of my head. > How much delay are you seeing in leadership change? Are there any > controller socket timeouts in the log? > Also might want to take a look at KIP-291 (KAFKA-4453), which is meant for > shortening this time period for handling controller events. > > Thanks, > > Mayuresh > > On Thu, Dec 6, 2018 at 9:50 AM Harper Henn <[email protected]> wrote: > > > Hi Mark, > > > > If a broker fails and you want to elect a new leader as quickly as > > possible, you could tweak zookeeper.session.timeout.ms in the kafka > broker > > configuration. According to the documentation: "If the consumer fails to > > heartbeat to ZooKeeper for this period of time it is considered dead and > a > > rebalance will occur." > > > > https://kafka.apache.org/0101/documentation.html > > > > I think making zookeeper.session.timeout.ms smaller will result in > faster > > detection of a dead node, but the downside is that a leader election > might > > get triggered by network blips or other cases where your broker is not > > actually dead. > > > > Harper > > > > On Thu, Dec 6, 2018 at 9:11 AM Mark Anderson <[email protected]> > > wrote: > > > > > Hi, > > > > > > I'm currently testing how Kafka reacts in cases of broker failure due > to > > > process failure or network timeout. > > > > > > I'd like to have the election of a new leader for a topic partition > > happen > > > as quickly as possible but it is unclear from the documentation or > broker > > > configuration what the key parameters are to tune to make this > possible. > > > > > > Does anyone have any pointers? Or are there any guides online? > > > > > > Thanks, > > > Mark > > > > > > > > -- > -Regards, > Mayuresh R. Gharat > (862) 250-7125 >
