I do see ZkClient expired when this really happened.

-----Original Message-----
From: Seshadri, Balaji [mailto:balaji.sesha...@dish.com]
Sent: Saturday, January 11, 2014 10:31 PM
To: users@kafka.apache.org
Subject: RE: Looks like consumer fetchers get stopped we are not getting any 
data

Currently its set to default,whats the timeout should we increase to.
________________________________________
From: Guozhang Wang [wangg...@gmail.com]
Sent: Saturday, January 11, 2014 3:53 PM
To: users@kafka.apache.org
Subject: Re: Looks like consumer fetchers get stopped we are not getting any 
data

>From the logs it seems the consumer's ZK registry log has lost, while
KAFKA-693 is mainly due to server side issue. Could you check if there is a 
session timeout from the consumer on ZK log.

Guozhang


On Sat, Jan 11, 2014 at 2:33 PM, Seshadri, Balaji
<balaji.sesha...@dish.com>wrote:

> We found the below bug in Kafka JIRA.
>
> https://issues.apache.org/jira/browse/KAFKA-693
>
> Could you guys please let us know if this bug is causing the fetchers
> getting stooped because of rebalance failure,we are using 0.8-beta and
> this fix seems to be in 0.8 version.
>
> The issue happens only when we run it in tomcat,when we use JUnits
> with same group.id to test we receive the messages.It would be great
> if you could help us confirm if it is a bug.
>
> Also we would like to know if we need to upgrade the server/client to
> 0.8 or just server side code.
>
>
> ________________________________________
> From: Rob Withers [robert.w.with...@gmail.com]
> Sent: Friday, January 10, 2014 10:17 PM
> To: users@kafka.apache.org
> Subject: Re: Looks like consumer fetchers get stopped we are not
> getting any data
>
> That was an interesting section too.  Which GC settings would you suggest?
>
> Thank you,
> - charlie
>
> > On Jan 10, 2014, at 10:11 PM, Jun Rao <jun...@gmail.com> wrote:
> >
> > Have you looked at our FAQ, especially
> >
> https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyaretherem
> anyrebalancesinmyconsumerlog
> > ?
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Fri, Jan 10, 2014 at 2:25 PM, Seshadri, Balaji
> > <balaji.sesha...@dish.com>wrote:
> >
> >> Any clue would be helpful.
> >>
> >> -----Original Message-----
> >> From: Seshadri, Balaji [mailto:balaji.sesha...@dish.com]
> >> Sent: Friday, January 10, 2014 12:46 PM
> >> To: users@kafka.apache.org
> >> Subject: RE: Looks like consumer fetchers get stopped we are not
> >> getting any data
> >>
> >> Yes rebalance begins and exceptions occurs.
> >>
> >>
> >> {2014-01-10 00:58:11,293} INFO
> >>
> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660
> b_watcher_executor]
> >> (?:?) - [account-i
> >> nfo-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b],
> >> Cleared
> the
> >> data chunks in all the consumer message iterators
> >> {2014-01-10 00:58:11,293} INFO
> >>
> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660
> b_watcher_executor]
> >> (?:?) - [account-i
> >> nfo-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b],
> Committing
> >> all offsets after clearing the fetcher queues
> >> {2014-01-10 00:58:11,298} DEBUG [catalina-exec-12-SendThread(
> >> tvip-m1-mw-zookeeper.dish.com:2181)] (ClientCnxn.java:759) - Got
> >> ping response for sessionid: 0x1437b2879870005 af ter 0ms
> >> {2014-01-10 00:58:11,313} INFO
> >>
> [account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738_w
> atcher_executor]
> >> (?:?) -
> >> [account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b673
> >> 8],
> >> begin rebalancing consumer
> >> account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738
> >> try
> #1
> >> {2014-01-10 00:58:11,314} DEBUG [catalina-exec-12-SendThread(
> >> tvip-m1-mw-zookeeper.dish.com:2181)] (ClientCnxn.java:839) -
> >> Reading reply sessionid:0x1437b2879870005, packet:: clientPath:null
> serverPath:null
> >> finished:false header:: 627,8  replyHeader:: 627,51539619966,0
>  request::
> >> '/brokers/ids,F  response:: v{'1}
> >> {2014-01-10 00:58:11,315} DEBUG [catalina-exec-12-SendThread(
> >> tvip-m1-mw-zookeeper.dish.com:2181)] (ClientCnxn.java:839) -
> >> Reading reply sessionid:0x1437b2879870005, packet:: clientPath:null
> serverPath:null
> >> finished:false header:: 628,4  replyHeader:: 628,51539619966,0
>  request::
> >> '/brokers/ids/1,F  response::
> >>
> #7b2022686f7374223a22746d312d6b61666b6162726f6b6572313031222c20226a6d7
> 85f706f7274223a393939392c2022706f7274223a393039322c202276657273696f6e2
> 23a31207d,s{47244644685,47244644685,1388537628753,1388537628753,0,0,0,
> 163056791896588316,74,0,47244644685}
> >> {2014-01-10 00:58:11,316} DEBUG [catalina-exec-12-SendThread(
> >> tvip-m1-mw-zookeeper.dish.com:2181)] (ClientCnxn.java:839) -
> >> Reading reply sessionid:0x1437b2879870005, packet:: clientPath:null
> serverPath:null
> >> finished:false header:: 629,4  replyHeader:: 629,51539619966,-101
> >> request::
> >>
> '/consumers/account-activated-hadoop-consumer/ids/account-activated-ha
> doop-consumer_tm1mwdpl04-1389222557906-562b6738,F
> >> response::
> >> {2014-01-10 00:58:11,316} INFO
> >>
> [account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738_w
> atcher_executor]
> >> (?:?) -
> >> [account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b673
> >> 8],
> >> exception during rebalance
> >> org.I0Itec.zkclient.exception.ZkNoNodeException:
> >> org.apache.zookeeper.KeeperException$NoNodeException:
> >> KeeperErrorCode = NoNode for
> >>
> /consumers/account-activated-hadoop-consumer/ids/account-activated-had
> oop-consumer_tm1mwdpl04-1389222557906-562b6738
> >>        at
> >> org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
> >>        at
> >> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685)
> >>        at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
> >>        at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
> >>        at kafka.utils.ZkUtils$.readData(Unknown Source)
> >>        at kafka.consumer.TopicCount$.constructTopicCount(Unknown Source)
> >>        at
> >>
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$c
> onsumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(Unk
> nown
> >> Source)
> >>        at
> >>
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfu
> n$syncedRebalance$1.apply$mcVI$sp(Unknown
> >> Source)
> >>        at
> scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:142)
> >>        at
> >>
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedR
> ebalance(Unknown
> >> Source)
> >>        at
> >>
> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anon$1
> .run(Unknown
> >> Source) Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
> >> KeeperErrorCode = NoNode for
> >>
> /consumers/account-activated-hadoop-consumer/ids/account-activated-had
> oop-consumer_tm1mwdpl04-1389222557906-562b6738
> >>        at
> >> org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
> >>        at
> >> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
> >>        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927)
> >>        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:956)
> >>        at
> org.I0Itec.zkclient.ZkConnection.readData(ZkConnection.java:103)
> >> :
> >> -----Original Message-----
> >> From: Guozhang Wang [mailto:wangg...@gmail.com]
> >> Sent: Friday, January 10, 2014 12:30 PM
> >> To: users@kafka.apache.org
> >> Subject: Re: Looks like consumer fetchers get stopped we are not
> >> getting any data
> >>
> >> Actually, the broken channel is broken by shutting down the
> >> leader-finder-thread, which is shutdown either by a rebalance retry
> >> or shutting down the consumer.
> >>
> >> Do you see "begin rebalance ..." before this log entry? And if yes,
> search
> >> to see if the rebalance keep failing.
> >>
> >> Guozhang
> >>
> >>
> >> On Fri, Jan 10, 2014 at 11:23 AM, Guozhang Wang
> >> <wangg...@gmail.com>
> >> wrote:
> >>
> >>> From your logs the channel with the brokers are broken, are the
> >>> brokers alive at that time?
> >>>
> >>> Guozhang
> >>>
> >>>
> >>> On Fri, Jan 10, 2014 at 10:52 AM, Withers, Robert
> >>> <robert.with...@dish.com
> >>>> wrote:
> >>>
> >>>> The core problem is our consumers stop consuming and lag increases.
> >>>> We found this blog:
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Myconsumerse
> emstohavestopped,why
> >> ?.
> >>>> This lists 3 possibilities.
> >>>>
> >>>> The blog also talks earlier about spurious rebalances, due to
> >>>> improper GC settings, but we couldn't find what GC settings to use.
> >>>> We are considering changing the zookeeper timeouts.  We are a
> >>>> little confused about the various issues, the sequence of issues
> >>>> and what could cause the consumers to stop reading.  If the
> >>>> fetchers get shutdown, due to a ClosedByInterruptException in the 
> >>>> "leader_finder"
> >>>> thread, which tells the "executor_watcher" thread to shutdown the
> >>>> fetchers, that would be another reason the consumers stop
> >>>> processing
> >> data.  Is this possible?
> >>>>
> >>>> Thank you,
> >>>> rob
> >>>>
> >>>> -----Original Message-----
> >>>> From: Seshadri, Balaji [mailto:balaji.sesha...@dish.com]
> >>>> Sent: Friday, January 10, 2014 11:40 AM
> >>>> To: users@kafka.apache.org
> >>>> Subject: RE: Looks like consumer fetchers get stopped we are not
> >>>> getting any data
> >>>>
> >>>> It would be helpful if you guys can shed some light why all
> >>>> fetchers are getting stopped.
> >>>>
> >>>> -----Original Message-----
> >>>> From: Seshadri, Balaji [mailto:balaji.sesha...@dish.com]
> >>>> Sent: Friday, January 10, 2014 11:28 AM
> >>>> To: users@kafka.apache.org
> >>>> Subject: RE: Looks like consumer fetchers get stopped we are not
> >>>> getting any data
> >>>>
> >>>> We also got the below error when this happens.
> >>>>
> >>>> {2014-01-10 00:58:11,292} INFO
> >>>>
> >>>> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad
> >>>> 5966
> >>>> 0b_watcher_executor]
> >>>> (?:?) -
> >>>> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad
> >>>> 5966
> >>>> 0b],
> >>>> exception during rebalance
> >>>> org.I0Itec.zkclient.exception.ZkNoNodeException:
> >>>> org.apache.zookeeper.KeeperException$NoNodeException:
> >>>> KeeperErrorCode = NoNode for
> >>
> /consumers/account-info-updated-hadoop-consumer/ids/account-info-updat
> ed-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b
> >>>>        at
> >>>> org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
> >>>>        at
> >>>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685)
> >>>>        at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
> >>>>        at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
> >>>>        at kafka.utils.ZkUtils$.readData(Unknown Source)
> >>>>        at kafka.consumer.TopicCount$.constructTopicCount(Unknown
> >> Source)
> >>>>        at
> >>>> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.ka
> >>>> fka$
> >>>> consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalan
> >>>> ce(U
> >>>> nknown
> >>>> Source)
> >>>>        at
> >>>> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$a
> >>>> nonf un$syncedRebalance$1.apply$mcVI$sp(Unknown
> >>>> Source)
> >>>>        at
> >>>> scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:142)
> >>>>        at
> >>>> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.sy
> >>>> nced
> >>>> Rebalance(Unknown
> >>>> Source)
> >>>>        at
> >>>> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$a
> >>>> non$
> >>>> 1.run(Unknown
> >>>> Source) Caused by:
> org.apache.zookeeper.KeeperException$NoNodeException:
> >>>> KeeperErrorCode = NoNode for
> >>
> /consumers/account-info-updated-hadoop-consumer/ids/account-info-updat
> ed-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b
> >>>>        at
> >>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
> >>>>        at
> >>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
> >>>>        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927)
> >>>>        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:956)
> >>>>        at
> >>>> org.I0Itec.zkclient.ZkConnection.readData(ZkConnection.java:103)
> >>>>        at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:770)
> >>>>        at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:766)
> >>>>        at
> >>>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
> >>>>        ... 9 more
> >>>>
> >>>> -----Original Message-----
> >>>> From: Seshadri, Balaji [mailto:balaji.sesha...@dish.com]
> >>>> Sent: Friday, January 10, 2014 10:52 AM
> >>>> To: users@kafka.apache.org
> >>>> Subject: Looks like consumer fetchers get stopped we are not
> >>>> getting any data
> >>>>
> >>>> Please let us know why we are not getting any data from Kafaka
> >>>> after this log from Kafka,can you guys lets us know.
> >>>>
> >>>> What could be causing all fetchers associated to be stooped why
> >>>> it is not doing retry.
> >>>>
> >>>> {2014-01-10 00:58:09,284} WARN
> >>>>
> >>>> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad
> >>>> 5966
> >>>> 0b-leader-finder-thread]
> >>>> (?:?) - Fetching topic metadata with correlation id 3 for topics
> >>>> [Set(account-info-updated)] from broker
> >>>> [id:1,host:tm1-kafkabroker101,port:9092] failed
> >>>> java.nio.channels.ClosedByInterruptException
> >>>>        at
> >>
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterru
> ptibleChannel.java:202)
> >>>>        at
> >> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:506)
> >>>>        at
> java.nio.channels.SocketChannel.write(SocketChannel.java:493)
> >>>>        at kafka.network.BoundedByteBufferSend.writeTo(Unknown Source)
> >>>>        at kafka.network.Send$class.writeCompletely(Unknown Source)
> >>>>        at
> >>>> kafka.network.BoundedByteBufferSend.writeCompletely(Unknown
> >>>> Source)
> >>>>        at kafka.network.BlockingChannel.send(Unknown Source)
> >>>>        at kafka.producer.SyncProducer.liftedTree1$1(Unknown Source)
> >>>>        at
> >>>> kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(U
> >>>> nkno
> >>>> wn
> >>>> Source)
> >>>>        at kafka.producer.SyncProducer.send(Unknown Source)
> >>>>        at kafka.client.ClientUtils$.fetchTopicMetadata(Unknown Source)
> >>>>        at kafka.client.ClientUtils$.fetchTopicMetadata(Unknown Source)
> >>>>        at
> >>>> kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(U
> >>>> nkno
> >>>> wn
> >>>> Source)
> >>>>        at kafka.utils.ShutdownableThread.run(Unknown Source)
> >>>> {2014-01-10 00:58:09,284} DEBUG
> >>>> [account-access-hadoop-consumer_tm1mwdpl04-1389222551916-a0c87abc
> >>>> _wat
> >>>> cher_executor]
> >>>> (?:?) - initial fetch offset of account-access:27: fetched offset
> >>>> =
> >> 9655:
> >>>> consumed offset = 9655 is 9655
> >>>> {2014-01-10 00:58:09,284} DEBUG
> >>>> [bill-generated-hadoop-consumer_tm1mwdpl04-1389222547995-29a6dce9
> >>>> _wat
> >>>> cher_executor]
> >>>> (?:?) - initial consumer offset of bill-generated:11: fetched
> >>>> offset =
> >> 152:
> >>>> consumed offset = 152 is 152
> >>>> {2014-01-10 00:58:09,284} DEBUG
> >>>> [outbound-communications-hadoop-consumer_tm1mwdpl04-1389222550693
> >>>> -8bc
> >>>> 34b77_watcher_executor]
> >>>> (?:?) -
> >>>> [outbound-communications-hadoop-consumer_tm1mwdpl04-1389222550693
> >>>> -8bc
> >>>> 34b77],
> >>>> outbound-communications:108: fetched offset = 1689: consumed
> >>>> offset =
> >>>> 1689 selected new offset 1689
> >>>> {2014-01-10 00:58:09,284} DEBUG [catalina-exec-3-SendThread(
> >>>> tvip-m1-mw-zookeeper.dish.com:2181)] (ClientCnxn.java:839) -
> >>>> Reading reply sessionid:0x1434b49cf56383b, packet::
> >>>> clientPath:null serverPath:null finished:false header:: 279,4  
> >>>> replyHeader::
> >> 279,51539617506,0  request::
> >>>> '/consumers/outbound-call-attempted-hadoop-consumer/offsets/outbo
> >>>> und-
> >>>> call-attempted/14,F
> >>>> response::
> >>>> #30,s{39860186414,39860186414,1387517714994,1387517714994,0,0,0,0
> >>>> ,1,0
> >>>> ,39860186414}
> >>>> {2014-01-10 00:58:09,285} INFO
> >>>>
> >>>> [outbound-communications-hadoop-consumer_tm1mwdpl04-1389222550693
> >>>> -8bc
> >>>> 34b77_watcher_executor]
> >>>> (?:?) -
> >>>> [outbound-communications-hadoop-consumer_tm1mwdpl04-1389222550693
> >>>> -8bc
> >>>> 34b77],
> >>>> outbound-communications-hadoop-consumer_tm1mwdpl04-1389222550693-
> >>>> 8bc3
> >>>> 4b77-3
> >>>> attempting to claim partition 109
> >>>> {2014-01-10 00:58:09,284} DEBUG
> >>>> [bill-generated-hadoop-consumer_tm1mwdpl04-1389222547995-29a6dce9
> >>>> _wat
> >>>> cher_executor]
> >>>> (?:?) - initial fetch offset of bill-generated:11: fetched offset
> >>>> =
> 152:
> >>>> consumed offset = 152 is 152
> >>>> {2014-01-10 00:58:09,284} DEBUG [catalina-exec-12-SendThread(
> >>>> tvip-m1-mw-zookeeper.dish.com:2181)] (ClientCnxn.java:839) -
> >>>> Reading reply sessionid:0x1437b2879870005, packet::
> >>>> clientPath:null serverPath:null finished:false header:: 619,1  
> >>>> replyHeader::
> >> 619,51539617508,0  request::
> >>>> '/consumers/account-activated-hadoop-consumer/owners/account-acti
> >>>> vate
> >>>> d/68,#6163636f756e742d6163746976617465642d6861646f6f702d636f6e737
> >>>> 56d6
> >>>> 5725f746d316d7764706c30342d313338393232323535373930362d3536326236
> >>>> 3733
> >>>> 382d30,v{s{31,s{'world,'anyone}}},1
> >>>> response::
> >>>> '/consumers/account-activated-hadoop-consumer/owners/account-acti
> >>>> vate
> >>>> d/68
> >>>> {2014-01-10 00:58:09,284} DEBUG
> >>>> [account-access-hadoop-consumer_tm1mwdpl04-1389222551916-a0c87abc
> >>>> _wat
> >>>> cher_executor]
> >>>> (?:?) -
> >>>> [account-access-hadoop-consumer_tm1mwdpl04-1389222551916-a0c87abc
> >>>> ],
> >>>> account-access:27: fetched offset = 9655: consumed offset = 9655
> >>>> selected new offset 9655
> >>>> {2014-01-10 00:58:09,284} INFO
> >>>>
> >>>> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad
> >>>> 5966
> >>>> 0b_watcher_executor]
> >>>> (?:?) -
> >>>> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad
> >>>> 5966
> >>>> 0b-leader-finder-thread],
> >>>> Shutdown completed
> >>>> {2014-01-10 00:58:09,284} INFO
> >>>>
> >>>> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad
> >>>> 5966
> >>>> 0b-leader-finder-thread]
> >>>> (?:?) -
> >>>> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad
> >>>> 5966
> >>>> 0b-leader-finder-thread],
> >>>> Stopped
> >>>> {2014-01-10 00:58:09,285} INFO
> >>>>
> >>>> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad
> >>>> 5966
> >>>> 0b_watcher_executor]
> >>>> (?:?) - [ConsumerFetcherManager-1389222553163] Stopping all
> >>>> fetchers
> >>>
> >>>
> >>> --
> >>> -- Guozhang
> >>
> >>
> >>
> >> --
> >> -- Guozhang
> >>
> >>
>



--
-- Guozhang

Reply via email to