Neha/Jun, The same problem started happening again although now our zookeeper cluster is configured correctly. The produce always failed with LeaderNotAvailableException and list topics shows topic is created with leader "none". In the controller and stage-change log, I am seeing lot of these failures..
[2013-09-17 19:21:36,531] WARN [KafkaApi-2] Produce request with correlation id 622369865 from client on partition [FunnelProto,6] failed due to Partition [FunnelProto,6] doesn't exist on 2 (kafka.server.KafkaApis) [2013-09-17 19:21:36,531] WARN [KafkaApi-2] Produce request with correlation id 622369865 from client on partition [internal_metrics,3] failed due to Partition [internal_metrics,3] doesn't exist on 2 (kafka.server.KafkaApis) [2013-09-17 19:21:36,531] WARN [KafkaApi-2] Produce request with correlation id 622369865 from client on partition [FunnelProto,0] failed due to Partition [FunnelProto,0] doesn't exist on 2 (kafka.server.KafkaApis) [2013-09-17 19:21:36,531] WARN [KafkaApi-2] Produce request with correlation id 622369865 from client on partition [jmx,3] failed due to Partition [jmx,3] doesn't exist on 2 (kafka.server.KafkaApis) [2013-09-17 19:21:36,531] WARN [KafkaApi-2] Produce request with correlation id 622369865 from client on partition [FunnelProto,5] failed due to Partition [FunnelProto,5] doesn't exist on 2 (kafka.server.KafkaApis) When I ran listTopics command for one of above topic, all partitions are under replicated (we have replication factor set to 3). Any clues on what could be issue and how can we get it back to working? Thanks, Raja. On Fri, Sep 13, 2013 at 6:26 PM, Neha Narkhede <neha.narkh...@gmail.com>wrote: > Ah ok. Thanks for sharing that. > > > > On Fri, Sep 13, 2013 at 2:50 PM, Rajasekar Elango <rela...@salesforce.com > >wrote: > > > We have 3 zookeeper node in the cluster with a hardware load balancer . > In > > one of the zookeeper, we did not configure ensemble correctly (server.n > > property in zoo.cfg) . So it ended up as like 2 nodes in one cluster, one > > node in other cluster. The load balancer is randomly hitting one of 2 > > zookeepers in two different cluster. > > > > Thanks, > > Raja. > > > > > > On Fri, Sep 13, 2013 at 1:04 PM, Neha Narkhede <neha.narkh...@gmail.com > > >wrote: > > > > > Just curious to know, what was the misconfiguration? > > > > > > > > > On Fri, Sep 13, 2013 at 10:02 AM, Rajasekar Elango > > > <rela...@salesforce.com>wrote: > > > > > > > Thanks Neha and Jun, It turned out to be miss configuration in our > > > > zookeeper cluster. After correcting it everything looks good. > > > > > > > > Thanks, > > > > Raja. > > > > > > > > > > > > On Fri, Sep 13, 2013 at 10:13 AM, Jun Rao <jun...@gmail.com> wrote: > > > > > > > > > Any error in the controller and the state-change log? Are brokers > > 2,3,4 > > > > > alive? > > > > > > > > > > Thanks, > > > > > > > > > > Jun > > > > > > > > > > > > > > > On Thu, Sep 12, 2013 at 4:56 PM, Rajasekar Elango < > > > > rela...@salesforce.com > > > > > >wrote: > > > > > > > > > > > We are seeing a problem that we we try to send messages to new > > topic > > > it > > > > > > fails kafka.common.LeaderNotAvailableException. But usually this > > > > problem > > > > > > will be transient and if we re-send messages to same topic will > > work. > > > > But > > > > > > now we tried rending message to same topic several time, but > still > > > > fails > > > > > > with same error: > > > > > > > > > > > > In the server log I see ] Auto creation of topic test-sjl2 with 8 > > > > > > partitions and replication factor 3 is successful!. But > listTopics > > > > > command > > > > > > shows leader "none" like below: > > > > > > > > > > > > topic: test-sjl2 partition: 0 leader: none replicas: > > > 2,4,3 > > > > > > isr: > > > > > > topic: test-sjl2 partition: 1 leader: none replicas: > > > 3,2,4 > > > > > > isr: > > > > > > topic: test-sjl2 partition: 2 leader: none replicas: > > > 4,3,2 > > > > > > isr: > > > > > > topic: test-sjl2 partition: 3 leader: none replicas: > > > 2,3,4 > > > > > > isr: > > > > > > topic: test-sjl2 partition: 4 leader: none replicas: > > > 3,4,2 > > > > > > isr: > > > > > > topic: test-sjl2 partition: 5 leader: none replicas: > > > 4,2,3 > > > > > > isr: > > > > > > topic: test-sjl2 partition: 6 leader: none replicas: > > > 2,4,3 > > > > > > isr: > > > > > > topic: test-sjl2 partition: 7 leader: none replicas: > > > 3,2,4 > > > > > > isr: > > > > > > > > > > > > I also see following NotLeaderForPatritionExcetion and > > > > ZookeeperExcetion > > > > > in > > > > > > logs > > > > > > > > > > > > kafka.common.NotLeaderForPartitionException > > > > > > at > > > > sun.reflect.GeneratedConstructorAccessor19.newInstance(Unknown > > > > > > Source) > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > > > > > > at > > > > > java.lang.reflect.Constructor.newInstance(Constructor.java:513) > > > > > > at java.lang.Class.newInstance0(Class.java:355) > > > > > > at java.lang.Class.newInstance(Class.java:308) > > > > > > at > > > > kafka.common.ErrorMapping$.exceptionFor(ErrorMapping.scala:70) > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$4$$anonfun$apply$5.apply(AbstractFetcherThread.scala:158) > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$4$$anonfun$apply$5.apply(AbstractFetcherThread.scala:158) > > > > > > at kafka.utils.Logging$class.warn(Logging.scala:88) > > > > > > at > > > > > kafka.utils.ShutdownableThread.warn(ShutdownableThread.scala:23) > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$4.apply(AbstractFetcherThread.scala:157) > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$4.apply(AbstractFetcherThread.scala:113) > > > > > > at > > > > > > > > > scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:178) > > > > > > at > > > > > > > > > > > > scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:347) > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:113) > > > > > > at > > > > > > > > > > > > kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:89) > > > > > > at > > > > > kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51) > > > > > > 2013-09-12 23:54:10,838 [kafka-request-handler-2] ERROR > > > > > > (kafka.utils.ZkUtils$) - Conditional update of path > > > > > > /brokers/topics/FunnelProto/partitions/4/state with data { > > > > > > "controller_epoch":3, "isr":[ 2, 5 ], "leader":2, > "leader_epoch":2, > > > > > > "version":1 } and expected version 14 failed due to > > > > > > org.apache.zookeeper.KeeperException$BadVersionException: > > > > > KeeperErrorCode = > > > > > > BadVersion for /brokers/topics/FunnelProto/partitions/4/state > > > > > > 2013-09-12 23:54:10,838 [kafka-request-handler-2] ERROR > > > > > > (kafka.utils.ZkUtils$) - Conditional update of path > > > > > > /brokers/topics/FunnelProto/partitions/4/state with data { > > > > > > "controller_epoch":3, "isr":[ 2, 5 ], "leader":2, > "leader_epoch":2, > > > > > > "version":1 } and expected version 14 failed due to > > > > > > org.apache.zookeeper.KeeperException$BadVersionException: > > > > > KeeperErrorCode = > > > > > > BadVersion for /brokers/topics/FunnelProto/partitions/4/state > > > > > > > > > > > > > > > > > > Any clues on what could be problem.. ? > > > > > > > > > > > > Any for your help. > > > > > > > > > > > > -- > > > > > > Thanks, > > > > > > Raja. > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Thanks, > > > > Raja. > > > > > > > > > > > > > > > -- > > Thanks, > > Raja. > > > -- Thanks, Raja.