Hi, Any updates on this issue? I keep seeing this issue happening over and over again
On Thu, May 7, 2015 at 7:28 PM, tao xiao <xiaotao...@gmail.com> wrote: > Hi team, > > I have a 12 nodes cluster that has 800 topics and each of which has only 1 > partition. I observed that one of the node keeps generating > NotLeaderForPartitionException that causes the node to be unresponsive to > all requests. Below is the exception > > [2015-05-07 04:16:01,014] ERROR [ReplicaFetcherThread-1-12], Error for > partition [topic1,0] to broker 12:class > kafka.common.NotLeaderForPartitionException > (kafka.server.ReplicaFetcherThread) > > All other nodes in the cluster generate lots of replication error too as > shown below due to unresponsiveness of above node. > > [2015-05-07 04:17:34,917] WARN [Replica Manager on Broker 1]: Fetch > request with correlation id 3630911 from client ReplicaFetcherThread-0-1 on > partition [topic1,0] failed due to Leader not local for partition > [cg22_user.item_attr_info.lcr,0] on broker 1 (kafka.server.ReplicaManager) > > Any suggestion why the node runs into the unstable stage and any > configuration I can set to prevent this? > > I use kafka 0.8.2.1 > > And here is the server.properties > > > broker.id=5 > port=9092 > num.network.threads=3 > num.io.threads=8 > socket.send.buffer.bytes=1048576 > socket.receive.buffer.bytes=1048576 > socket.request.max.bytes=104857600 > log.dirs=/mnt/kafka > num.partitions=1 > num.recovery.threads.per.data.dir=1 > log.retention.hours=1 > log.segment.bytes=1073741824 > log.retention.check.interval.ms=300000 > log.cleaner.enable=false > zookeeper.connect=ip:2181 > zookeeper.connection.timeout.ms=6000 > unclean.leader.election.enable=false > delete.topic.enable=true > default.replication.factor=3 > num.replica.fetchers=3 > delete.topic.enable=true > kafka.metrics.reporters=report.KafkaMetricsCollector > straas.hubble.conf.file=/etc/kafka/report.conf > > > > > -- > Regards, > Tao > -- Regards, Tao