Hi team, I have a 12 nodes cluster that has 800 topics and each of which has only 1 partition. I observed that one of the node keeps generating NotLeaderForPartitionException that causes the node to be unresponsive to all requests. Below is the exception
[2015-05-07 04:16:01,014] ERROR [ReplicaFetcherThread-1-12], Error for partition [topic1,0] to broker 12:class kafka.common.NotLeaderForPartitionException (kafka.server.ReplicaFetcherThread) All other nodes in the cluster generate lots of replication error too as shown below due to unresponsiveness of above node. [2015-05-07 04:17:34,917] WARN [Replica Manager on Broker 1]: Fetch request with correlation id 3630911 from client ReplicaFetcherThread-0-1 on partition [topic1,0] failed due to Leader not local for partition [cg22_user.item_attr_info.lcr,0] on broker 1 (kafka.server.ReplicaManager) Any suggestion why the node runs into the unstable stage and any configuration I can set to prevent this? I use kafka 0.8.2.1 And here is the server.properties broker.id=5 port=9092 num.network.threads=3 num.io.threads=8 socket.send.buffer.bytes=1048576 socket.receive.buffer.bytes=1048576 socket.request.max.bytes=104857600 log.dirs=/mnt/kafka num.partitions=1 num.recovery.threads.per.data.dir=1 log.retention.hours=1 log.segment.bytes=1073741824 log.retention.check.interval.ms=300000 log.cleaner.enable=false zookeeper.connect=ip:2181 zookeeper.connection.timeout.ms=6000 unclean.leader.election.enable=false delete.topic.enable=true default.replication.factor=3 num.replica.fetchers=3 delete.topic.enable=true kafka.metrics.reporters=report.KafkaMetricsCollector straas.hubble.conf.file=/etc/kafka/report.conf -- Regards, Tao