Correction: Actually the rebalance happened quite until 24 hours after the start, and thats where below errors were found. Ideally rebalance should not have happened at all.
Thanks Zakee > On Mar 9, 2015, at 10:28 AM, Zakee <kzak...@netzero.net> wrote: > >> Hmm, that sounds like a bug. Can you paste the log of leader rebalance >> here? > Thanks for you suggestions. > It looks like the rebalance actually happened only once soon after I started > with clean cluster and data was pushed, it didn’t happen again so far, and I > see the partitions leader counts on brokers did not change since then. One of > the brokers was constantly showing 0 for partition leader count. Is that > normal? > > Also, I still see lots of below errors (~69k) going on in the logs since the > restart. Is there any other reason than rebalance for these errors? > > [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for > partition [Topic-11,7] to broker 5:class > kafka.common.NotLeaderForPartitionException > (kafka.server.ReplicaFetcherThread) > [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for > partition [Topic-2,25] to broker 5:class > kafka.common.NotLeaderForPartitionException > (kafka.server.ReplicaFetcherThread) > [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for > partition [Topic-2,21] to broker 5:class > kafka.common.NotLeaderForPartitionException > (kafka.server.ReplicaFetcherThread) > [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for > partition [Topic-22,9] to broker 5:class > kafka.common.NotLeaderForPartitionException > (kafka.server.ReplicaFetcherThread) > >> Some other things to check are: >> 1. The actual property name is auto.leader.rebalance.enable, not >> auto.leader.rebalance. You’ve probably known this, just to double confirm. > Yes > >> 2. In zookeeper path, can you verify /admin/preferred_replica_election >> does not exist? > ls /admin > [delete_topics] > ls /admin/preferred_replica_election > Node does not exist: /admin/preferred_replica_election > > > Thanks > Zakee > > > >> On Mar 7, 2015, at 10:49 PM, Jiangjie Qin <j...@linkedin.com.INVALID> wrote: >> >> Hmm, that sounds like a bug. Can you paste the log of leader rebalance >> here? >> Some other things to check are: >> 1. The actual property name is auto.leader.rebalance.enable, not >> auto.leader.rebalance. You’ve probably known this, just to double confirm. >> 2. In zookeeper path, can you verify /admin/preferred_replica_election >> does not exist? >> >> Jiangjie (Becket) Qin >> >> On 3/7/15, 10:24 PM, "Zakee" <kzak...@netzero.net> wrote: >> >>> I started with clean cluster and started to push data. It still does the >>> rebalance at random durations even though the auto.leader.relabalance is >>> set to false. >>> >>> Thanks >>> Zakee >>> >>> >>> >>>> On Mar 6, 2015, at 3:51 PM, Jiangjie Qin <j...@linkedin.com.INVALID> >>>> wrote: >>>> >>>> Yes, the rebalance should not happen in that case. That is a little bit >>>> strange. Could you try to launch a clean Kafka cluster with >>>> auto.leader.election disabled and try push data? >>>> When leader migration occurs, NotLeaderForPartition exception is >>>> expected. >>>> >>>> Jiangjie (Becket) Qin >>>> >>>> >>>> On 3/6/15, 3:14 PM, "Zakee" <kzak...@netzero.net> wrote: >>>> >>>>> Yes, Jiangjie, I do see lots of these errors "Starting preferred >>>>> replica >>>>> leader election for partitions” in logs. I also see lot of Produce >>>>> request failure warnings in with the NotLeader Exception. >>>>> >>>>> I tried switching off the auto.leader.relabalance to false. I am still >>>>> noticing the rebalance happening. My understanding was the rebalance >>>>> will >>>>> not happen when this is set to false. >>>>> >>>>> Thanks >>>>> Zakee >>>>> >>>>> >>>>> >>>>>> On Feb 25, 2015, at 5:17 PM, Jiangjie Qin <j...@linkedin.com.INVALID> >>>>>> wrote: >>>>>> >>>>>> I don’t think num.replica.fetchers will help in this case. Increasing >>>>>> number of fetcher threads will only help in cases where you have a >>>>>> large >>>>>> amount of data coming into a broker and more replica fetcher threads >>>>>> will >>>>>> help keep up. We usually only use 1-2 for each broker. But in your >>>>>> case, >>>>>> it looks that leader migration cause issue. >>>>>> Do you see anything else in the log? Like preferred leader election? >>>>>> >>>>>> Jiangjie (Becket) Qin >>>>>> >>>>>> On 2/25/15, 5:02 PM, "Zakee" <kzak...@netzero.net >>>>>> <mailto:kzak...@netzero.net>> wrote: >>>>>> >>>>>>> Thanks, Jiangjie. >>>>>>> >>>>>>> Yes, I do see under partitions usually shooting every hour. Anythings >>>>>>> that >>>>>>> I could try to reduce it? >>>>>>> >>>>>>> How does "num.replica.fetchers" affect the replica sync? Currently >>>>>>> have >>>>>>> configured 7 each of 5 brokers. >>>>>>> >>>>>>> -Zakee >>>>>>> >>>>>>> On Wed, Feb 25, 2015 at 4:17 PM, Jiangjie Qin >>>>>>> <j...@linkedin.com.invalid> >>>>>>> wrote: >>>>>>> >>>>>>>> These messages are usually caused by leader migration. I think as >>>>>>>> long >>>>>>>> as >>>>>>>> you don¹t see this lasting for ever and got a bunch of under >>>>>>>> replicated >>>>>>>> partitions, it should be fine. >>>>>>>> >>>>>>>> Jiangjie (Becket) Qin >>>>>>>> >>>>>>>> On 2/25/15, 4:07 PM, "Zakee" <kzak...@netzero.net> wrote: >>>>>>>> >>>>>>>>> Need to know if I should I be worried about this or ignore them. >>>>>>>>> >>>>>>>>> I see tons of these exceptions/warnings in the broker logs, not >>>>>>>>> sure >>>>>>>> what >>>>>>>>> causes them and what could be done to fix them. >>>>>>>>> >>>>>>>>> ERROR [ReplicaFetcherThread-3-5], Error for partition [TestTopic] >>>>>>>>> to >>>>>>>>> broker >>>>>>>>> 5:class kafka.common.NotLeaderForPartitionException >>>>>>>>> (kafka.server.ReplicaFetcherThread) >>>>>>>>> [2015-02-25 11:01:41,785] ERROR [ReplicaFetcherThread-3-5], Error >>>>>>>>> for >>>>>>>>> partition [TestTopic] to broker 5:class >>>>>>>>> kafka.common.NotLeaderForPartitionException >>>>>>>>> (kafka.server.ReplicaFetcherThread) >>>>>>>>> [2015-02-25 11:01:41,785] WARN [Replica Manager on Broker 2]: Fetch >>>>>>>>> request >>>>>>>>> with correlation id 950084 from client ReplicaFetcherThread-1-2 on >>>>>>>>> partition [TestTopic,2] failed due to Leader not local for >>>>>>>>> partition >>>>>>>>> [TestTopic,2] on broker 2 (kafka.server.ReplicaManager) >>>>>>>>> >>>>>>>>> >>>>>>>>> Any ideas? >>>>>>>>> >>>>>>>>> -Zakee >>>>>>>>> ____________________________________________________________ >>>>>>>>> Next Apple Sensation >>>>>>>>> 1 little-known path to big profits >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061st0 >>>>>>>>> 3v >>>>>>>>> uc >>>>>>>> >>>>>>>> ____________________________________________________________ >>>>>>>> Extended Stay America >>>>>>>> Get Fantastic Amenities, low rates! Kitchen, Ample Workspace, Free >>>>>>>> WIFI >>>>>>>> >>>>>>>> >>>>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4mp02 >>>>>>>> du >>>>>>>> c >>>>>>>> >>>>>> >>>>>> >>>>>> ____________________________________________________________ >>>>>> Extended Stay America >>>>>> Official Site. Free WIFI, Kitchens. Our best rates here, guaranteed. >>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13duc >>>>>> >>>>>> <http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13duc >>>>>>> >>>> >>>> >>>> ____________________________________________________________ >>>> The WORST exercise for aging >>>> Avoid this "healthy" exercise to look & feel 5-10 years YOUNGER >>>> http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07duc >>> >> >> >> ____________________________________________________________ >> Seabourn Luxury Cruises >> Receive special offers from the World's Finest Small-Ship Cruise Line! >> http://thirdpartyoffers.netzero.net/TGL3255/54fbf3b0f058073b02901mp14duc >