Hi, I'm trying to replicate a broker shutdown in unit tests. I've got a simple cluster running with 2 brokers (and one ZK). I'm successfully able to create a topic with a single partition and replication factor of 2.
I'd like to test shutting down the current leader for the partition and make sure my code handles the exceptions thrown such as NotLeaderForPartitionException. I can't seem to shutdown a broker and have the remaining one report that it is now the leader for the partition. It looks as though the controller successfully changes leadership, but the broker itself is unaware of the change. Here's a gist of the (convoluted) logs[1]. The sequence is as follows: 1- start 1 ZK and 2 brokers 2- create a topic (test-bogus) with 1 partition and 2 replication factor 3- wait for leadership 4- ask the controller who is the leader 5- ask all brokers who is the leader 6- shutdown leader 7- wait for leadership 8- ask the controller who is the leader 9- ask the remaining broker who is the leader Steps 4-6 appear here in the logs[2] Steps 8-9 appear here[3] As you can see, the controller is aware of the leadership change, but not the broker. I've activated controlled shutdown and this is still happening. Any idea what may be causing this? I'm using Kafka 0.8.1.1 and ZK 3.4.5-cdh4.6 I'm using a TopicMetadataRequest for asking the brokers and inspecting ControllerContext.partitionLeadershipInfo to fetch leadership from the Controller. Thanks Philippe [1] https://gist.github.com/plaflamme/60805bfe15ae0106304a [2] https://gist.github.com/plaflamme/60805bfe15ae0106304a#file-gistfile1-txt-L153-L158 [3] https://gist.github.com/plaflamme/60805bfe15ae0106304a#file-gistfile1-txt-L227-L228