Re: Replica fetcher continually disconnecting after broker replacement

2016-09-07 Thread Tommy Becker
Thanks for the response; we found the issue. We run in AWS, and inexplicably, the new instance we launched to replace the dead one had exceedingly low network bandwidth to exactly one of the remaining brokers, resulting in timeouts. After rolling the dice again things are replicating normally.

Re: Replica fetcher continually disconnecting after broker replacement

2016-09-07 Thread Ryan Pridgeon
One possibility is that broker 0 has exhausted it's available file descriptors. If this is the case it will be able to maintain it's existing connections, giving off the appearance that it is operating normally while refusing new ones. I don't recall the exact exception message but something along

Replica fetcher continually disconnecting after broker replacement

2016-09-06 Thread Tommy Becker
We had a hardware failure on broker 1 of a 3 broker cluster over the weekend. The broker was replaced, and when the replacement broker came up it started to replicate partitions from the other 2 brokers as you'd expect. But while broker 1 (the replacement) was able to fetch properly from broker