Gal Barak created KAFKA-6385: -------------------------------- Summary: Rack awareness ignored by kafka-reassign-partitions Key: KAFKA-6385 URL: https://issues.apache.org/jira/browse/KAFKA-6385 Project: Kafka Issue Type: Bug Affects Versions: 1.0.0 Environment: Ubuntu 16.04 Reporter: Gal Barak Priority: Minor Attachments: actual.txt, topic-to-move.json
Hi, It seems that the kafka-reassign-partitions script ignores rack awareness, when suggesting a new partition layout. Came across it when doing some initial testing with Kafka. +To reproduce:+ # Create a Kafka cluster with 3 brokers (1,2,3). Use 3 different racks (broker.rack definition. Example: "A", "B" and "C"). #* I used a non-root directory in zookeeper (i.e. - {{<zookeeper 1>:2181,<zookeeper 2>:2181,<zookeeper 3>:2182/<directory name for cluster>}}) #* The tested topic was automatically created, according to a default configuration of 12 partitions and 3 replicas per topic. # Install a 4th broker, and assign it to the same rack as the 1st broker ("A"). # Create a topics-to-move.json file for a single topic. The file I used was uploaded as topic-to-move.json. # Run the kafka-reassign-partitions script: {{kafka-reassign-partitions --zookeeper <zookeeper 1>:2181,<zookeeper 2>:2181,<zookeeper 3>:2182/<directory name for cluster> --topics-to-move-json-file <topics-to-move.json file> --broker-list "1,2,3,4" --generate}} +Expected result:+ A suggested reassignment that makes sure that no partitions uses both broker 1 and broker 4 as its replicas. +Actual results of the command:+ The full result is attached as a file (actual.txt). It includes partitions with replicas that are on both brokers 1 and 4, which are two servers on the same rack. Example: {"topic":"<REDUCTED>","partition":6,"replicas":[1,2,4]} +Additional notes:+ * I did not test starting the cluster from scratch. The same behavior might be present when topic partitions are created automatically (in which case, the priority might be higher). * I'm not sure it's related. But the original assignment seems to be problematic as well: If a single server (of the 3) failed, a different single server became the leader for all of its partitions. For example, if broker 1 failed, server 2 became the leader for all of the partitions for which 1 was previously the leader, instead of having the load distributed evenly between brokers 2 and 3. -- This message was sent by Atlassian JIRA (v6.4.14#64029)