Any clue would be helpful.

-----Original Message-----
From: Seshadri, Balaji [mailto:balaji.sesha...@dish.com]
Sent: Friday, January 10, 2014 12:46 PM
To: users@kafka.apache.org
Subject: RE: Looks like consumer fetchers get stopped we are not getting any 
data

Yes rebalance begins and exceptions occurs.


{2014-01-10 00:58:11,293} INFO  
[account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b_watcher_executor]
 (?:?) - [account-i 
nfo-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b], Cleared the 
data chunks in all the consumer message iterators
{2014-01-10 00:58:11,293} INFO  
[account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b_watcher_executor]
 (?:?) - [account-i 
nfo-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b], Committing all 
offsets after clearing the fetcher queues
{2014-01-10 00:58:11,298} DEBUG 
[catalina-exec-12-SendThread(tvip-m1-mw-zookeeper.dish.com:2181)] 
(ClientCnxn.java:759) - Got ping response for sessionid: 0x1437b2879870005 af 
ter 0ms
{2014-01-10 00:58:11,313} INFO  
[account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738_watcher_executor]
 (?:?) - [account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738], 
begin rebalancing consumer 
account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738 try #1
{2014-01-10 00:58:11,314} DEBUG 
[catalina-exec-12-SendThread(tvip-m1-mw-zookeeper.dish.com:2181)] 
(ClientCnxn.java:839) - Reading reply sessionid:0x1437b2879870005, packet:: 
clientPath:null serverPath:null finished:false header:: 627,8  replyHeader:: 
627,51539619966,0  request:: '/brokers/ids,F  response:: v{'1}
{2014-01-10 00:58:11,315} DEBUG 
[catalina-exec-12-SendThread(tvip-m1-mw-zookeeper.dish.com:2181)] 
(ClientCnxn.java:839) - Reading reply sessionid:0x1437b2879870005, packet:: 
clientPath:null serverPath:null finished:false header:: 628,4  replyHeader:: 
628,51539619966,0  request:: '/brokers/ids/1,F  response:: 
#7b2022686f7374223a22746d312d6b61666b6162726f6b6572313031222c20226a6d785f706f7274223a393939392c2022706f7274223a393039322c202276657273696f6e223a31207d,s{47244644685,47244644685,1388537628753,1388537628753,0,0,0,163056791896588316,74,0,47244644685}
{2014-01-10 00:58:11,316} DEBUG 
[catalina-exec-12-SendThread(tvip-m1-mw-zookeeper.dish.com:2181)] 
(ClientCnxn.java:839) - Reading reply sessionid:0x1437b2879870005, packet:: 
clientPath:null serverPath:null finished:false header:: 629,4  replyHeader:: 
629,51539619966,-101  request:: 
'/consumers/account-activated-hadoop-consumer/ids/account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738,F
  response::
{2014-01-10 00:58:11,316} INFO  
[account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738_watcher_executor]
 (?:?) - [account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738], 
exception during rebalance
org.I0Itec.zkclient.exception.ZkNoNodeException: 
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode 
for 
/consumers/account-activated-hadoop-consumer/ids/account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738
        at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
        at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685)
        at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
        at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
        at kafka.utils.ZkUtils$.readData(Unknown Source)
        at kafka.consumer.TopicCount$.constructTopicCount(Unknown Source)
        at 
kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(Unknown
 Source)
        at 
kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonfun$syncedRebalance$1.apply$mcVI$sp(Unknown
 Source)
        at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:142)
        at 
kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(Unknown
 Source)
        at 
kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anon$1.run(Unknown
 Source) Caused by: org.apache.zookeeper.KeeperException$NoNodeException: 
KeeperErrorCode = NoNode for 
/consumers/account-activated-hadoop-consumer/ids/account-activated-hadoop-consumer_tm1mwdpl04-1389222557906-562b6738
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927)
        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:956)
        at org.I0Itec.zkclient.ZkConnection.readData(ZkConnection.java:103)
:
-----Original Message-----
From: Guozhang Wang [mailto:wangg...@gmail.com]
Sent: Friday, January 10, 2014 12:30 PM
To: users@kafka.apache.org
Subject: Re: Looks like consumer fetchers get stopped we are not getting any 
data

Actually, the broken channel is broken by shutting down the 
leader-finder-thread, which is shutdown either by a rebalance retry or shutting 
down the consumer.

Do you see "begin rebalance ..." before this log entry? And if yes, search to 
see if the rebalance keep failing.

Guozhang


On Fri, Jan 10, 2014 at 11:23 AM, Guozhang Wang <wangg...@gmail.com> wrote:

> From your logs the channel with the brokers are broken, are the
> brokers alive at that time?
>
> Guozhang
>
>
> On Fri, Jan 10, 2014 at 10:52 AM, Withers, Robert
> <robert.with...@dish.com
> > wrote:
>
>> The core problem is our consumers stop consuming and lag increases.
>> We found this blog:
>> https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Myconsumerseemstohavestopped,why?.
>>  This lists 3 possibilities.
>>
>> The blog also talks earlier about spurious rebalances, due to
>> improper GC settings, but we couldn't find what GC settings to use.
>> We are considering changing the zookeeper timeouts.  We are a little
>> confused about the various issues, the sequence of issues and what
>> could cause the consumers to stop reading.  If the fetchers get
>> shutdown, due to a ClosedByInterruptException in the "leader_finder"
>> thread, which tells the "executor_watcher" thread to shutdown the
>> fetchers, that would be another reason the consumers stop processing data.  
>> Is this possible?
>>
>> Thank you,
>> rob
>>
>> -----Original Message-----
>> From: Seshadri, Balaji [mailto:balaji.sesha...@dish.com]
>> Sent: Friday, January 10, 2014 11:40 AM
>> To: users@kafka.apache.org
>> Subject: RE: Looks like consumer fetchers get stopped we are not
>> getting any data
>>
>> It would be helpful if you guys can shed some light why all fetchers
>> are getting stopped.
>>
>> -----Original Message-----
>> From: Seshadri, Balaji [mailto:balaji.sesha...@dish.com]
>> Sent: Friday, January 10, 2014 11:28 AM
>> To: users@kafka.apache.org
>> Subject: RE: Looks like consumer fetchers get stopped we are not
>> getting any data
>>
>> We also got the below error when this happens.
>>
>> {2014-01-10 00:58:11,292} INFO
>>
>> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad5966
>> 0b_watcher_executor]
>> (?:?) -
>> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad5966
>> 0b],
>> exception during rebalance
>> org.I0Itec.zkclient.exception.ZkNoNodeException:
>> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode
>> = NoNode for
>> /consumers/account-info-updated-hadoop-consumer/ids/account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b
>>         at
>> org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
>>         at
>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:685)
>>         at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:766)
>>         at org.I0Itec.zkclient.ZkClient.readData(ZkClient.java:761)
>>         at kafka.utils.ZkUtils$.readData(Unknown Source)
>>         at kafka.consumer.TopicCount$.constructTopicCount(Unknown Source)
>>         at
>> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.kafka$
>> consumer$ZookeeperConsumerConnector$ZKRebalancerListener$$rebalance(U
>> nknown
>> Source)
>>         at
>> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anonf
>> un$syncedRebalance$1.apply$mcVI$sp(Unknown
>> Source)
>>         at
>> scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:142)
>>         at
>> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.synced
>> Rebalance(Unknown
>> Source)
>>         at
>> kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener$$anon$
>> 1.run(Unknown
>> Source) Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
>> KeeperErrorCode = NoNode for
>> /consumers/account-info-updated-hadoop-consumer/ids/account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad59660b
>>         at
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
>>         at
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>>         at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927)
>>         at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:956)
>>         at
>> org.I0Itec.zkclient.ZkConnection.readData(ZkConnection.java:103)
>>         at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:770)
>>         at org.I0Itec.zkclient.ZkClient$9.call(ZkClient.java:766)
>>         at
>> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
>>         ... 9 more
>>
>> -----Original Message-----
>> From: Seshadri, Balaji [mailto:balaji.sesha...@dish.com]
>> Sent: Friday, January 10, 2014 10:52 AM
>> To: users@kafka.apache.org
>> Subject: Looks like consumer fetchers get stopped we are not getting
>> any data
>>
>> Please let us know why we are not getting any data from Kafaka after
>> this log from Kafka,can you guys lets us know.
>>
>> What could be causing all fetchers associated to be stooped why it is
>> not doing retry.
>>
>> {2014-01-10 00:58:09,284} WARN
>>
>> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad5966
>> 0b-leader-finder-thread]
>> (?:?) - Fetching topic metadata with correlation id 3 for topics
>> [Set(account-info-updated)] from broker
>> [id:1,host:tm1-kafkabroker101,port:9092] failed
>> java.nio.channels.ClosedByInterruptException
>>         at
>> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
>>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:506)
>>         at java.nio.channels.SocketChannel.write(SocketChannel.java:493)
>>         at kafka.network.BoundedByteBufferSend.writeTo(Unknown Source)
>>         at kafka.network.Send$class.writeCompletely(Unknown Source)
>>         at
>> kafka.network.BoundedByteBufferSend.writeCompletely(Unknown
>> Source)
>>         at kafka.network.BlockingChannel.send(Unknown Source)
>>         at kafka.producer.SyncProducer.liftedTree1$1(Unknown Source)
>>         at
>> kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(Unkno
>> wn
>> Source)
>>         at kafka.producer.SyncProducer.send(Unknown Source)
>>         at kafka.client.ClientUtils$.fetchTopicMetadata(Unknown Source)
>>         at kafka.client.ClientUtils$.fetchTopicMetadata(Unknown Source)
>>         at
>> kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(Unkno
>> wn
>> Source)
>>         at kafka.utils.ShutdownableThread.run(Unknown Source)
>> {2014-01-10 00:58:09,284} DEBUG
>> [account-access-hadoop-consumer_tm1mwdpl04-1389222551916-a0c87abc_wat
>> cher_executor]
>> (?:?) - initial fetch offset of account-access:27: fetched offset = 9655:
>> consumed offset = 9655 is 9655
>> {2014-01-10 00:58:09,284} DEBUG
>> [bill-generated-hadoop-consumer_tm1mwdpl04-1389222547995-29a6dce9_wat
>> cher_executor]
>> (?:?) - initial consumer offset of bill-generated:11: fetched offset = 152:
>> consumed offset = 152 is 152
>> {2014-01-10 00:58:09,284} DEBUG
>> [outbound-communications-hadoop-consumer_tm1mwdpl04-1389222550693-8bc
>> 34b77_watcher_executor]
>> (?:?) -
>> [outbound-communications-hadoop-consumer_tm1mwdpl04-1389222550693-8bc
>> 34b77],
>> outbound-communications:108: fetched offset = 1689: consumed offset =
>> 1689 selected new offset 1689
>> {2014-01-10 00:58:09,284} DEBUG [catalina-exec-3-SendThread(
>> tvip-m1-mw-zookeeper.dish.com:2181)] (ClientCnxn.java:839) - Reading
>> reply sessionid:0x1434b49cf56383b, packet:: clientPath:null
>> serverPath:null finished:false header:: 279,4  replyHeader:: 
>> 279,51539617506,0  request::
>> '/consumers/outbound-call-attempted-hadoop-consumer/offsets/outbound-
>> call-attempted/14,F
>>  response::
>> #30,s{39860186414,39860186414,1387517714994,1387517714994,0,0,0,0,1,0
>> ,39860186414}
>> {2014-01-10 00:58:09,285} INFO
>>
>> [outbound-communications-hadoop-consumer_tm1mwdpl04-1389222550693-8bc
>> 34b77_watcher_executor]
>> (?:?) -
>> [outbound-communications-hadoop-consumer_tm1mwdpl04-1389222550693-8bc
>> 34b77],
>> outbound-communications-hadoop-consumer_tm1mwdpl04-1389222550693-8bc3
>> 4b77-3
>> attempting to claim partition 109
>> {2014-01-10 00:58:09,284} DEBUG
>> [bill-generated-hadoop-consumer_tm1mwdpl04-1389222547995-29a6dce9_wat
>> cher_executor]
>> (?:?) - initial fetch offset of bill-generated:11: fetched offset = 152:
>> consumed offset = 152 is 152
>> {2014-01-10 00:58:09,284} DEBUG [catalina-exec-12-SendThread(
>> tvip-m1-mw-zookeeper.dish.com:2181)] (ClientCnxn.java:839) - Reading
>> reply sessionid:0x1437b2879870005, packet:: clientPath:null
>> serverPath:null finished:false header:: 619,1  replyHeader:: 
>> 619,51539617508,0  request::
>> '/consumers/account-activated-hadoop-consumer/owners/account-activate
>> d/68,#6163636f756e742d6163746976617465642d6861646f6f702d636f6e73756d6
>> 5725f746d316d7764706c30342d313338393232323535373930362d35363262363733
>> 382d30,v{s{31,s{'world,'anyone}}},1
>>  response::
>> '/consumers/account-activated-hadoop-consumer/owners/account-activate
>> d/68
>> {2014-01-10 00:58:09,284} DEBUG
>> [account-access-hadoop-consumer_tm1mwdpl04-1389222551916-a0c87abc_wat
>> cher_executor]
>> (?:?) -
>> [account-access-hadoop-consumer_tm1mwdpl04-1389222551916-a0c87abc],
>> account-access:27: fetched offset = 9655: consumed offset = 9655
>> selected new offset 9655
>> {2014-01-10 00:58:09,284} INFO
>>
>> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad5966
>> 0b_watcher_executor]
>> (?:?) -
>> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad5966
>> 0b-leader-finder-thread],
>> Shutdown completed
>> {2014-01-10 00:58:09,284} INFO
>>
>> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad5966
>> 0b-leader-finder-thread]
>> (?:?) -
>> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad5966
>> 0b-leader-finder-thread],
>> Stopped
>> {2014-01-10 00:58:09,285} INFO
>>
>> [account-info-updated-hadoop-consumer_tm1mwdpl04-1389222553159-ad5966
>> 0b_watcher_executor]
>> (?:?) - [ConsumerFetcherManager-1389222553163] Stopping all fetchers
>>
>>
>
>
> --
> -- Guozhang
>



--
-- Guozhang

Reply via email to