Hi, this happens most times I restart the consumer group, but not every time. There are no log errors and nothing seems to be indicating that a rebalance is occurring. Here are the ZK logs I see on one of the processes that isn’t receiving partitions.
2015-05-04 13:55:32,365 [main] INFO org.apache.zookeeper.ZooKeeper:438 - Initiating client connection, connectString=lxpkfkdal01.nanigans.com sessionTime out=400 watcher=org.I0Itec.zkclient.ZkClient@6971e8ba 2015-05-04 13:55:32,366 [main-SendThread(10.8.44.121:2181)] INFO org.apache.zookeeper.ClientCnxn:966 - Opening socket connection to server 10.8.44.121/10 .8.44.121:2181. Will not attempt to authenticate using SASL (unknown error) 2015-05-04 13:55:32,367 [main-SendThread(10.8.44.121:2181)] INFO org.apache.zookeeper.ClientCnxn:849 - Socket connection established to 10.8.44.121/10.8. 44.121:2181, initiating session 2015-05-04 13:55:32,371 [main-SendThread(10.8.44.121:2181)] INFO org.apache.zookeeper.ClientCnxn:1207 - Session establishment complete on server 10.8.44. 121/10.8.44.121:2181, sessionid = 0x14691649cf75e2c, negotiated timeout = 4000 Here is the output of the ConsumerOffsetChecker, note that 6 of the partitions are unclaimed: Group Topic Pid Offset logSize Lag Owner rtb_targeting_server compile_request 0 328831805 328832108 303 rtb_targeting_server_lxptedal01.nanigans.com-1430747732348-fd8b839e-0 rtb_targeting_server compile_request 1 328680629 328680761 132 rtb_targeting_server_lxptedal01.nanigans.com-1430747732348-fd8b839e-1 rtb_targeting_server compile_request 2 328322706 328626882 304176 none rtb_targeting_server compile_request 3 328397868 328703662 305794 none rtb_targeting_server compile_request 4 328393846 328393923 77 rtb_targeting_server_lxptedal02.nanigans.com-1430747790699-36d3501a-0 rtb_targeting_server compile_request 5 329085299 329085385 86 rtb_targeting_server_lxptedal02.nanigans.com-1430747790699-36d3501a-1 rtb_targeting_server compile_request 6 328667153 328667153 0 rtb_targeting_server_lxptedal02.nanigans.com-1430747831428-55fd145a-0 rtb_targeting_server compile_request 7 328537143 328537272 129 rtb_targeting_server_lxptedal02.nanigans.com-1430747831428-55fd145a-1 rtb_targeting_server compile_request 8 328613787 328913671 299884 none rtb_targeting_server compile_request 9 328212202 328516662 304460 none rtb_targeting_server compile_request 10 329370706 329370951 245 rtb_targeting_server_lxptedal03.nanigans.com-1430747931179-ea46a266-0 rtb_targeting_server compile_request 11 328207478 328207705 227 rtb_targeting_server_lxptedal03.nanigans.com-1430747931179-ea46a266-1 rtb_targeting_server compile_request 12 328564790 328564790 0 rtb_targeting_server_lxptedal04.nanigans.com-1430747991705-492127bc-0 rtb_targeting_server compile_request 13 328473600 328473672 72 rtb_targeting_server_lxptedal04.nanigans.com-1430747991705-492127bc-1 rtb_targeting_server compile_request 14 329088239 329088315 76 rtb_targeting_server_lxptedal04.nanigans.com-1430748032481-7b5b56d7-0 rtb_targeting_server compile_request 15 328311986 328311986 0 rtb_targeting_server_lxptedal04.nanigans.com-1430748032481-7b5b56d7-1 rtb_targeting_server compile_request 16 328615462 328615497 35 rtb_targeting_server_lxptedal05.nanigans.com-1430748084888-c523a089-0 rtb_targeting_server compile_request 17 327853920 327853949 29 rtb_targeting_server_lxptedal05.nanigans.com-1430748084888-c523a089-1 rtb_targeting_server compile_request 18 328196285 328497010 300725 none rtb_targeting_server compile_request 19 330429455 330733318 303863 none rtb_targeting_server compile_request 20 328678091 328678137 46 rtb_targeting_server_lxptedal06.nanigans.com-1430748183878-b5f84424-0 rtb_targeting_server compile_request 21 328089585 328089585 0 rtb_targeting_server_lxptedal06.nanigans.com-1430748183878-b5f84424-1 rtb_targeting_server compile_request 22 328235530 328235571 41 rtb_targeting_server_lxptedal06.nanigans.com-1430748224863-9f2513a7-0 rtb_targeting_server compile_request 23 328699002 328699041 39 rtb_targeting_server_lxptedal06.nanigans.com-1430748224863-9f2513a7-1 Thanks for your help, Dave On 4/29/15, 11:30 PM, "Aditya Auradkar" <aaurad...@linkedin.com.INVALID> wrote: >Hey Dave, > >It's hard to say why this is happening without more information. Even if there >are no errors in the log, is there anything to indicate that the rebalance >process on those hosts even started? Does this happen occasionally or every >time you start the consumer group? Can you paste the output of >ConsumerOffsetChecker and describe topic? > >Thanks, >Aditya >________________________________________ >From: Dave Hamilton [dhamil...@nanigans.com] >Sent: Wednesday, April 29, 2015 6:46 PM >To: users@kafka.apache.org; users@kafka.apache.org >Subject: Re: Unclaimed partitions > >Hi, would anyone be able to help me with this issue? Thanks. > >- Dave > > > >On Tue, Apr 28, 2015 at 1:32 PM -0700, "Dave Hamilton" ><dhamil...@nanigans.com<mailto:dhamil...@nanigans.com>> wrote: > >1. We’re using version 0.8.1.1. >2. No failures in the consumer logs >3. We’re using the ConsumerOffsetChecker to see what partitions are assigned >to the consumer group and what their offsets are. 8 of the 12 process each >have been assigned two partitions and they’re keeping up with the topic. The >other 4 do not get assigned partitions and no consumers in the group are >consuming those 8 partitions. > >Thanks for your help, >Dave > > > >On 4/28/15, 1:40 PM, "Aditya Auradkar" <aaurad...@linkedin.com.INVALID> wrote: > >>Couple of questions: >>- What version of the consumer API are you using? >>- Are you seeing any rebalance failures in the consumer logs? >>- How do you determine that some partitions are unassigned? Just confirming >>that you have partitions that are not being consumed from as opposed to >>consumer threads that aren't assigned any partitions. >> >>Aditya >> >>________________________________________ >>From: Dave Hamilton [dhamil...@nanigans.com] >>Sent: Tuesday, April 28, 2015 10:19 AM >>To: users@kafka.apache.org >>Subject: Re: Unclaimed partitions >> >>I’m sorry, I forgot to specify that these processes are in the same consumer >>group. >> >>Thanks, >>Dave >> >> >> >> >> >>On 4/28/15, 1:15 PM, "Aditya Auradkar" <aaurad...@linkedin.com.INVALID> wrote: >> >>>Hi Dave, >>> >>>The simple consumer doesn't do any state management across consumer >>>instances. So I'm not sure how you are assigning partitions in your >>>application code. Did you mean to say that you are using the high level >>>consumer API? >>> >>>Thanks, >>>Aditya >>> >>>________________________________________ >>>From: Dave Hamilton [dhamil...@nanigans.com] >>>Sent: Tuesday, April 28, 2015 7:58 AM >>>To: users@kafka.apache.org >>>Subject: Unclaimed partitions >>> >>>Hi, I am trying to consume a 24-partition topic across 12 processes. Each >>>process is using the simple consumer API, and each is being assigned two >>>consumer threads. I have noticed when starting these processes that >>>sometimes some of my processes are not being assigned any partitions, and no >>>rebalance seems to ever be triggered, leaving some of the partitions >>>unclaimed. >>> >>>When I first tried deploying this yesterday, I noticed 8 of the 24 >>>partitions, for 4 of the consumer processes, went unclaimed. Redeploying >>>shortly later corrected this issue. I tried deploying again today, and now I >>>see a different set of 4 processes not getting assigned partitions. The >>>processes otherwise appear to be running normally, they are currently >>>running in production and we are working to get the consumers quietly >>>running before enabling them to do any work. I’m not sure if we might be >>>looking at some sort of timing issue. >>> >>>Does anyone know what might be causing the issues we’re observing? >>> >>>Thanks, >>>Dave