Hi, this happens most times I restart the consumer group, but not every time. 
There are no log errors and nothing seems to be indicating that a rebalance is 
occurring. Here are the ZK logs I see on one of the processes that isn’t 
receiving partitions.

2015-05-04 13:55:32,365 [main] INFO  org.apache.zookeeper.ZooKeeper:438 - 
Initiating client connection, connectString=lxpkfkdal01.nanigans.com sessionTime
out=400 watcher=org.I0Itec.zkclient.ZkClient@6971e8ba
2015-05-04 13:55:32,366 [main-SendThread(10.8.44.121:2181)] INFO  
org.apache.zookeeper.ClientCnxn:966 - Opening socket connection to server 
10.8.44.121/10
.8.44.121:2181. Will not attempt to authenticate using SASL (unknown error)
2015-05-04 13:55:32,367 [main-SendThread(10.8.44.121:2181)] INFO  
org.apache.zookeeper.ClientCnxn:849 - Socket connection established to 
10.8.44.121/10.8.
44.121:2181, initiating session
2015-05-04 13:55:32,371 [main-SendThread(10.8.44.121:2181)] INFO  
org.apache.zookeeper.ClientCnxn:1207 - Session establishment complete on server 
10.8.44.
121/10.8.44.121:2181, sessionid = 0x14691649cf75e2c, negotiated timeout = 4000



Here is the output of the ConsumerOffsetChecker, note that 6 of the partitions 
are unclaimed:


Group           Topic                          Pid Offset          logSize      
   Lag             Owner
rtb_targeting_server compile_request                0   328831805       
328832108       303             
rtb_targeting_server_lxptedal01.nanigans.com-1430747732348-fd8b839e-0
rtb_targeting_server compile_request                1   328680629       
328680761       132             
rtb_targeting_server_lxptedal01.nanigans.com-1430747732348-fd8b839e-1
rtb_targeting_server compile_request                2   328322706       
328626882       304176          none
rtb_targeting_server compile_request                3   328397868       
328703662       305794          none
rtb_targeting_server compile_request                4   328393846       
328393923       77              
rtb_targeting_server_lxptedal02.nanigans.com-1430747790699-36d3501a-0
rtb_targeting_server compile_request                5   329085299       
329085385       86              
rtb_targeting_server_lxptedal02.nanigans.com-1430747790699-36d3501a-1
rtb_targeting_server compile_request                6   328667153       
328667153       0               
rtb_targeting_server_lxptedal02.nanigans.com-1430747831428-55fd145a-0
rtb_targeting_server compile_request                7   328537143       
328537272       129             
rtb_targeting_server_lxptedal02.nanigans.com-1430747831428-55fd145a-1
rtb_targeting_server compile_request                8   328613787       
328913671       299884          none
rtb_targeting_server compile_request                9   328212202       
328516662       304460          none
rtb_targeting_server compile_request                10  329370706       
329370951       245             
rtb_targeting_server_lxptedal03.nanigans.com-1430747931179-ea46a266-0
rtb_targeting_server compile_request                11  328207478       
328207705       227             
rtb_targeting_server_lxptedal03.nanigans.com-1430747931179-ea46a266-1
rtb_targeting_server compile_request                12  328564790       
328564790       0               
rtb_targeting_server_lxptedal04.nanigans.com-1430747991705-492127bc-0
rtb_targeting_server compile_request                13  328473600       
328473672       72              
rtb_targeting_server_lxptedal04.nanigans.com-1430747991705-492127bc-1
rtb_targeting_server compile_request                14  329088239       
329088315       76              
rtb_targeting_server_lxptedal04.nanigans.com-1430748032481-7b5b56d7-0
rtb_targeting_server compile_request                15  328311986       
328311986       0               
rtb_targeting_server_lxptedal04.nanigans.com-1430748032481-7b5b56d7-1
rtb_targeting_server compile_request                16  328615462       
328615497       35              
rtb_targeting_server_lxptedal05.nanigans.com-1430748084888-c523a089-0
rtb_targeting_server compile_request                17  327853920       
327853949       29              
rtb_targeting_server_lxptedal05.nanigans.com-1430748084888-c523a089-1
rtb_targeting_server compile_request                18  328196285       
328497010       300725          none
rtb_targeting_server compile_request                19  330429455       
330733318       303863          none
rtb_targeting_server compile_request                20  328678091       
328678137       46              
rtb_targeting_server_lxptedal06.nanigans.com-1430748183878-b5f84424-0
rtb_targeting_server compile_request                21  328089585       
328089585       0               
rtb_targeting_server_lxptedal06.nanigans.com-1430748183878-b5f84424-1
rtb_targeting_server compile_request                22  328235530       
328235571       41              
rtb_targeting_server_lxptedal06.nanigans.com-1430748224863-9f2513a7-0
rtb_targeting_server compile_request                23  328699002       
328699041       39              
rtb_targeting_server_lxptedal06.nanigans.com-1430748224863-9f2513a7-1



Thanks for your help,
Dave





On 4/29/15, 11:30 PM, "Aditya Auradkar" <aaurad...@linkedin.com.INVALID> wrote:

>Hey Dave,
>
>It's hard to say why this is happening without more information. Even if there 
>are no errors in the log, is there anything to indicate that the rebalance 
>process on those hosts even started? Does this happen occasionally or every 
>time you start the consumer group? Can you paste the output of 
>ConsumerOffsetChecker and describe topic?
>
>Thanks,
>Aditya
>________________________________________
>From: Dave Hamilton [dhamil...@nanigans.com]
>Sent: Wednesday, April 29, 2015 6:46 PM
>To: users@kafka.apache.org; users@kafka.apache.org
>Subject: Re: Unclaimed partitions
>
>Hi, would anyone be able to help me with this issue? Thanks.
>
>- Dave
>
>
>
>On Tue, Apr 28, 2015 at 1:32 PM -0700, "Dave Hamilton" 
><dhamil...@nanigans.com<mailto:dhamil...@nanigans.com>> wrote:
>
>1. We’re using version 0.8.1.1.
>2. No failures in the consumer logs
>3. We’re using the ConsumerOffsetChecker to see what partitions are assigned 
>to the consumer group and what their offsets are. 8 of the 12 process each 
>have been assigned two partitions and they’re keeping up with the topic. The 
>other 4 do not get assigned partitions and no consumers in the group are 
>consuming those 8 partitions.
>
>Thanks for your help,
>Dave
>
>
>
>On 4/28/15, 1:40 PM, "Aditya Auradkar" <aaurad...@linkedin.com.INVALID> wrote:
>
>>Couple of questions:
>>- What version of the consumer API are you using?
>>- Are you seeing any rebalance failures in the consumer logs?
>>- How do you determine that some partitions are unassigned? Just confirming 
>>that you have partitions that are not being consumed from as opposed to 
>>consumer threads that aren't assigned any partitions.
>>
>>Aditya
>>
>>________________________________________
>>From: Dave Hamilton [dhamil...@nanigans.com]
>>Sent: Tuesday, April 28, 2015 10:19 AM
>>To: users@kafka.apache.org
>>Subject: Re: Unclaimed partitions
>>
>>I’m sorry, I forgot to specify that these processes are in the same consumer 
>>group.
>>
>>Thanks,
>>Dave
>>
>>
>>
>>
>>
>>On 4/28/15, 1:15 PM, "Aditya Auradkar" <aaurad...@linkedin.com.INVALID> wrote:
>>
>>>Hi Dave,
>>>
>>>The simple consumer doesn't do any state management across consumer 
>>>instances. So I'm not sure how you are assigning partitions in your 
>>>application code. Did you mean to say that you are using the high level 
>>>consumer API?
>>>
>>>Thanks,
>>>Aditya
>>>
>>>________________________________________
>>>From: Dave Hamilton [dhamil...@nanigans.com]
>>>Sent: Tuesday, April 28, 2015 7:58 AM
>>>To: users@kafka.apache.org
>>>Subject: Unclaimed partitions
>>>
>>>Hi, I am trying to consume a 24-partition topic across 12 processes. Each 
>>>process is using the simple consumer API, and each is being assigned two 
>>>consumer threads. I have noticed when starting these processes that 
>>>sometimes some of my processes are not being assigned any partitions, and no 
>>>rebalance seems to ever be triggered, leaving some of the partitions 
>>>unclaimed.
>>>
>>>When I first tried deploying this yesterday, I noticed 8 of the 24 
>>>partitions, for 4 of the consumer processes, went unclaimed. Redeploying 
>>>shortly later corrected this issue. I tried deploying again today, and now I 
>>>see a different set of 4 processes not getting assigned partitions. The 
>>>processes otherwise appear to be running normally, they are currently 
>>>running in production and we are working to get the consumers quietly 
>>>running before enabling them to do any work. I’m not sure if we might be 
>>>looking at some sort of timing issue.
>>>
>>>Does anyone know what might be causing the issues we’re observing?
>>>
>>>Thanks,
>>>Dave

Reply via email to