Hi folks,
We have a 10 node cluster and have several topics. Each topic has about 256
partitions with 3 replica factor. Now we run into an issue that in some
topic, a few partition (< 10)'s leader is -1 and all of them has only one
synced partition.

>From the Kafka manager, here's the snapshot:
[image: Inline image 2]

[image: Inline image 1]

here's the state log:
[2015-11-23 21:57:58,598] ERROR Controller 1 epoch 435499 initiated state
change for partition [userlogs,84] from OnlinePartition to OnlinePartition
failed (state.change.logger)
kafka.common.StateChangeFailedException: encountered error while electing
leader for partition [userlogs,84] due to: Preferred replica 0 for
partition [userlogs,84] is either not alive or not in the isr. Current
leader and ISR: [{"leader":-1,"leader_epoch":203,"isr":[1]}].
Caused by: kafka.common.StateChangeFailedException: Preferred replica 0 for
partition [userlogs,84] is either not alive or not in the isr. Current
leader and ISR: [{"leader":-1,"leader_epoch":203,"isr":[1]}]

My question is:
1) how could this happen and how can I fix it or work around it?
2) Is 256 partitions too big? We have about 200+ cores for spark streaming
job.

Thanks,
Qi

Reply via email to