I've confirmed that the same thing happens even if it's not the controller that's killed hard. Also, in several trials, it took between 10-30 seconds to recover.
Jason On Wed, Apr 8, 2015 at 1:31 PM, Jason Rosenberg <j...@squareup.com> wrote: > Hello, > > I'm still trying to get to the bottom of an issue we had previously, with > an unclean shutdown during an upgrade to 0.8.2.1 (from 0.8.1.1). In that > case, the controlled shutdown was interrupted, and the node was shutdown > abruptly. This resulted in about 5 minutes of unavailability for most > partitions. (I think that issue is related to the one reported by Thunder > in the thread titled: "Problem with node after restart no partitions?"). > > Anyway, while investigating that, I've gotten side-tracked, trying > understand what the expected behavior should be, if the controller node > dies abruptly. > > To test this, I have a small test cluster (2 nodes, 100 partitions, each > with replication factor 2, using 0.8.2.1). There are also a few test > producer clients, some of them high volume.... > > I intentionally killed the controller node hard. I noticed that for 10 > seconds, the second node spammed the logs for 10 seconds trying to fetch > data for the partitions it was following on the node that was killed. > Finally, after about 10 seconds, the second node elected itself the new > controller, and things slowly recovered. > > Clients could not successfully produce to the affected partitions until > the new controller was elected (and got failed meta-data requests trying to > discover the new leader partition). > > I would have expected the cluster to recover more quickly if a node fails, > if we have available replicas that can become leader and start receiving > data. With just 100 partitions, I would have expected this recovery to > happen very quickly. (Whereas in our previous issue, where it seemed to > take 5 minutes, the longer duration there was probably related to a much > larger number of partitions). > > Anyway, before I start filing Jira's and attaching log snippets, I'd like > to understand what the expected behavior should be? > > If a controller (or really any node in the cluster) undergoes unclean > shutdown, how should the cluster respond, in keeping replicas available > (assuming all replicas were in ISR before the shutdown). How fast should > controller and partition leader election happen in this case? > > Thanks, > > Jason >