When new topics are created, it takes some time for the controller to communicate the partition assignment for the new topic to all brokers. Until that happens, attempts to send/receive data to the brokers fails with the following kind of error -
[2013-09-17 19:21:36,531] WARN [KafkaApi-2] Produce request with correlation id 622369865 from client on partition [jmx,3] failed due to Partition [jmx,3] doesn't exist on 2 (kafka.server.KafkaApis) But eventually the broker receives the updated metadata for the new topic from the controller. When that happens, these errors go away. This can also happen on a newly restarted broker if it was not shutdown using controlled shutdown. Thanks, Neha On Wed, Sep 18, 2013 at 10:30 AM, Rajasekar Elango <rela...@salesforce.com>wrote: > From the output of StateChangeLogMerger tool, I see only this error > repeated; > > [2013-09-18 14:16:48,358] ERROR [KafkaApi-1] Error while fetching metadata > for partition [FunnelProto,0] (kafka.server.KafkaApis) > > On the state-change.log itself, I see this error: > > [2013-09-18 14:22:48,954] ERROR Conditional update of path > /brokers/topics/test-1379439240191/partitions/2/state with data { > "controller_epoch":10, "isr":[ 1, 5, 4 ], "leader":1, "leader_epoch":4, > "version":1 } and expected version 8 fai > led due to org.apache.zookeeper.KeeperException$BadVersionException: > KeeperErrorCode = BadVersion for > /brokers/topics/test-1379439240191/partitions/2/state > (kafka.utils.ZkUtils$) > > Do you know reason for above error..? Also this problem seem to be > intermittent, it started working now without any changes. I will continue > to monitor. > > Thanks, > Raja. > > > On Tue, Sep 17, 2013 at 7:59 PM, Neha Narkhede <neha.narkh...@gmail.com > >wrote: > > > Raja, > > > > Could you run the StateChangeLogMerger tool and give it one > topic-partition > > that has the above mentioned problem. This tool is documented here - > > > > > https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#Replicationtools-7.StateChangeLogMergerTool > > . > > > > Let me know if you run into any issues while using it. > > > > Thanks, > > Neha > > > > > > On Tue, Sep 17, 2013 at 12:27 PM, Rajasekar Elango > > <rela...@salesforce.com>wrote: > > > > > Neha/Jun, > > > > > > The same problem started happening again although now our zookeeper > > cluster > > > is configured correctly. The produce always failed with > > > LeaderNotAvailableException and list topics shows topic is created with > > > leader "none". In the controller and stage-change log, I am seeing lot > of > > > these failures.. > > > > > > > > > [2013-09-17 19:21:36,531] WARN [KafkaApi-2] Produce request with > > > correlation id 622369865 from client on partition [FunnelProto,6] > failed > > > due to Partition [FunnelProto,6] doesn't exist on 2 > > > (kafka.server.KafkaApis) > > > [2013-09-17 19:21:36,531] WARN [KafkaApi-2] Produce request with > > > correlation id 622369865 from client on partition [internal_metrics,3] > > > failed due to Partition [internal_metrics,3] doesn't exist on 2 > > > (kafka.server.KafkaApis) > > > [2013-09-17 19:21:36,531] WARN [KafkaApi-2] Produce request with > > > correlation id 622369865 from client on partition [FunnelProto,0] > failed > > > due to Partition [FunnelProto,0] doesn't exist on 2 > > > (kafka.server.KafkaApis) > > > [2013-09-17 19:21:36,531] WARN [KafkaApi-2] Produce request with > > > correlation id 622369865 from client on partition [jmx,3] failed due > to > > > Partition [jmx,3] doesn't exist on 2 (kafka.server.KafkaApis) > > > [2013-09-17 19:21:36,531] WARN [KafkaApi-2] Produce request with > > > correlation id 622369865 from client on partition [FunnelProto,5] > failed > > > due to Partition [FunnelProto,5] doesn't exist on 2 > > > (kafka.server.KafkaApis) > > > > > > > > > When I ran listTopics command for one of above topic, all partitions > are > > > under replicated (we have replication factor set to 3). Any clues on > what > > > could be issue and how can we get it back to working? > > > > > > Thanks, > > > Raja. > > > > > > > > > > > > On Fri, Sep 13, 2013 at 6:26 PM, Neha Narkhede < > neha.narkh...@gmail.com > > > >wrote: > > > > > > > Ah ok. Thanks for sharing that. > > > > > > > > > > > > > > > > On Fri, Sep 13, 2013 at 2:50 PM, Rajasekar Elango < > > > rela...@salesforce.com > > > > >wrote: > > > > > > > > > We have 3 zookeeper node in the cluster with a hardware load > > balancer . > > > > In > > > > > one of the zookeeper, we did not configure ensemble correctly > > (server.n > > > > > property in zoo.cfg) . So it ended up as like 2 nodes in one > cluster, > > > one > > > > > node in other cluster. The load balancer is randomly hitting one > of 2 > > > > > zookeepers in two different cluster. > > > > > > > > > > Thanks, > > > > > Raja. > > > > > > > > > > > > > > > On Fri, Sep 13, 2013 at 1:04 PM, Neha Narkhede < > > > neha.narkh...@gmail.com > > > > > >wrote: > > > > > > > > > > > Just curious to know, what was the misconfiguration? > > > > > > > > > > > > > > > > > > On Fri, Sep 13, 2013 at 10:02 AM, Rajasekar Elango > > > > > > <rela...@salesforce.com>wrote: > > > > > > > > > > > > > Thanks Neha and Jun, It turned out to be miss configuration in > > our > > > > > > > zookeeper cluster. After correcting it everything looks good. > > > > > > > > > > > > > > Thanks, > > > > > > > Raja. > > > > > > > > > > > > > > > > > > > > > On Fri, Sep 13, 2013 at 10:13 AM, Jun Rao <jun...@gmail.com> > > > wrote: > > > > > > > > > > > > > > > Any error in the controller and the state-change log? Are > > brokers > > > > > 2,3,4 > > > > > > > > alive? > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > Jun > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Sep 12, 2013 at 4:56 PM, Rajasekar Elango < > > > > > > > rela...@salesforce.com > > > > > > > > >wrote: > > > > > > > > > > > > > > > > > We are seeing a problem that we we try to send messages to > > new > > > > > topic > > > > > > it > > > > > > > > > fails kafka.common.LeaderNotAvailableException. But usually > > > this > > > > > > > problem > > > > > > > > > will be transient and if we re-send messages to same topic > > will > > > > > work. > > > > > > > But > > > > > > > > > now we tried rending message to same topic several time, > but > > > > still > > > > > > > fails > > > > > > > > > with same error: > > > > > > > > > > > > > > > > > > In the server log I see ] Auto creation of topic test-sjl2 > > > with 8 > > > > > > > > > partitions and replication factor 3 is successful!. But > > > > listTopics > > > > > > > > command > > > > > > > > > shows leader "none" like below: > > > > > > > > > > > > > > > > > > topic: test-sjl2 partition: 0 leader: none > > > replicas: > > > > > > 2,4,3 > > > > > > > > > isr: > > > > > > > > > topic: test-sjl2 partition: 1 leader: none > > > replicas: > > > > > > 3,2,4 > > > > > > > > > isr: > > > > > > > > > topic: test-sjl2 partition: 2 leader: none > > > replicas: > > > > > > 4,3,2 > > > > > > > > > isr: > > > > > > > > > topic: test-sjl2 partition: 3 leader: none > > > replicas: > > > > > > 2,3,4 > > > > > > > > > isr: > > > > > > > > > topic: test-sjl2 partition: 4 leader: none > > > replicas: > > > > > > 3,4,2 > > > > > > > > > isr: > > > > > > > > > topic: test-sjl2 partition: 5 leader: none > > > replicas: > > > > > > 4,2,3 > > > > > > > > > isr: > > > > > > > > > topic: test-sjl2 partition: 6 leader: none > > > replicas: > > > > > > 2,4,3 > > > > > > > > > isr: > > > > > > > > > topic: test-sjl2 partition: 7 leader: none > > > replicas: > > > > > > 3,2,4 > > > > > > > > > isr: > > > > > > > > > > > > > > > > > > I also see following NotLeaderForPatritionExcetion and > > > > > > > ZookeeperExcetion > > > > > > > > in > > > > > > > > > logs > > > > > > > > > > > > > > > > > > kafka.common.NotLeaderForPartitionException > > > > > > > > > at > > > > > > > sun.reflect.GeneratedConstructorAccessor19.newInstance(Unknown > > > > > > > > > Source) > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > > > > > > > > > at > > > > > > > > > java.lang.reflect.Constructor.newInstance(Constructor.java:513) > > > > > > > > > at java.lang.Class.newInstance0(Class.java:355) > > > > > > > > > at java.lang.Class.newInstance(Class.java:308) > > > > > > > > > at > > > > > > > kafka.common.ErrorMapping$.exceptionFor(ErrorMapping.scala:70) > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$4$$anonfun$apply$5.apply(AbstractFetcherThread.scala:158) > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$4$$anonfun$apply$5.apply(AbstractFetcherThread.scala:158) > > > > > > > > > at kafka.utils.Logging$class.warn(Logging.scala:88) > > > > > > > > > at > > > > > > > > > > kafka.utils.ShutdownableThread.warn(ShutdownableThread.scala:23) > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$4.apply(AbstractFetcherThread.scala:157) > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$4.apply(AbstractFetcherThread.scala:113) > > > > > > > > > at > > > > > > > > > > > > > > > > > > scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:178) > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:347) > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:113) > > > > > > > > > at > > > > > > > > > > > > > > > > > > > > > > > > > kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:89) > > > > > > > > > at > > > > > > > > > kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51) > > > > > > > > > 2013-09-12 23:54:10,838 [kafka-request-handler-2] ERROR > > > > > > > > > (kafka.utils.ZkUtils$) - Conditional update of path > > > > > > > > > /brokers/topics/FunnelProto/partitions/4/state with data { > > > > > > > > > "controller_epoch":3, "isr":[ 2, 5 ], "leader":2, > > > > "leader_epoch":2, > > > > > > > > > "version":1 } and expected version 14 failed due to > > > > > > > > > org.apache.zookeeper.KeeperException$BadVersionException: > > > > > > > > KeeperErrorCode = > > > > > > > > > BadVersion for > /brokers/topics/FunnelProto/partitions/4/state > > > > > > > > > 2013-09-12 23:54:10,838 [kafka-request-handler-2] ERROR > > > > > > > > > (kafka.utils.ZkUtils$) - Conditional update of path > > > > > > > > > /brokers/topics/FunnelProto/partitions/4/state with data { > > > > > > > > > "controller_epoch":3, "isr":[ 2, 5 ], "leader":2, > > > > "leader_epoch":2, > > > > > > > > > "version":1 } and expected version 14 failed due to > > > > > > > > > org.apache.zookeeper.KeeperException$BadVersionException: > > > > > > > > KeeperErrorCode = > > > > > > > > > BadVersion for > /brokers/topics/FunnelProto/partitions/4/state > > > > > > > > > > > > > > > > > > > > > > > > > > > Any clues on what could be problem.. ? > > > > > > > > > > > > > > > > > > Any for your help. > > > > > > > > > > > > > > > > > > -- > > > > > > > > > Thanks, > > > > > > > > > Raja. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Thanks, > > > > > > > Raja. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Thanks, > > > > > Raja. > > > > > > > > > > > > > > > > > > > > > -- > > > Thanks, > > > Raja. > > > > > > > > > -- > Thanks, > Raja. >