Re: Broker Exceptions

Zakee Mon, 09 Mar 2015 10:34:29 -0700

Correction: Actually  the rebalance happened quite until 24 hours after the 
start, and thats where below errors were found. Ideally rebalance should not 
have happened at all.



Thanks
Zakee



> On Mar 9, 2015, at 10:28 AM, Zakee <kzak...@netzero.net> wrote:
> 
>> Hmm, that sounds like a bug. Can you paste the log of leader rebalance
>> here?
> Thanks for you suggestions. 
> It looks like the rebalance actually happened only once soon after I started 
> with clean cluster and data was pushed, it didn’t happen again so far, and I 
> see the partitions leader counts on brokers did not change since then. One of 
> the brokers was constantly showing 0 for partition leader count. Is that 
> normal?
> 
> Also, I still see lots of below errors (~69k) going on in the logs since the 
> restart. Is there any other reason than rebalance for these errors?
> 
> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for 
> partition [Topic-11,7] to broker 5:class 
> kafka.common.NotLeaderForPartitionException 
> (kafka.server.ReplicaFetcherThread)
> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for 
> partition [Topic-2,25] to broker 5:class 
> kafka.common.NotLeaderForPartitionException 
> (kafka.server.ReplicaFetcherThread)
> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for 
> partition [Topic-2,21] to broker 5:class 
> kafka.common.NotLeaderForPartitionException 
> (kafka.server.ReplicaFetcherThread)
> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for 
> partition [Topic-22,9] to broker 5:class 
> kafka.common.NotLeaderForPartitionException 
> (kafka.server.ReplicaFetcherThread)
> 
>> Some other things to check are:
>> 1. The actual property name is auto.leader.rebalance.enable, not
>> auto.leader.rebalance. You’ve probably known this, just to double confirm.
> Yes 
> 
>> 2. In zookeeper path, can you verify /admin/preferred_replica_election
>> does not exist?
> ls /admin
> [delete_topics]
> ls /admin/preferred_replica_election
> Node does not exist: /admin/preferred_replica_election
> 
> 
> Thanks
> Zakee
> 
> 
> 
>> On Mar 7, 2015, at 10:49 PM, Jiangjie Qin <j...@linkedin.com.INVALID> wrote:
>> 
>> Hmm, that sounds like a bug. Can you paste the log of leader rebalance
>> here?
>> Some other things to check are:
>> 1. The actual property name is auto.leader.rebalance.enable, not
>> auto.leader.rebalance. You’ve probably known this, just to double confirm.
>> 2. In zookeeper path, can you verify /admin/preferred_replica_election
>> does not exist?
>> 
>> Jiangjie (Becket) Qin
>> 
>> On 3/7/15, 10:24 PM, "Zakee" <kzak...@netzero.net> wrote:
>> 
>>> I started with  clean cluster and started to push data. It still does the
>>> rebalance at random durations even though the auto.leader.relabalance is
>>> set to false.
>>> 
>>> Thanks
>>> Zakee
>>> 
>>> 
>>> 
>>>> On Mar 6, 2015, at 3:51 PM, Jiangjie Qin <j...@linkedin.com.INVALID>
>>>> wrote:
>>>> 
>>>> Yes, the rebalance should not happen in that case. That is a little bit
>>>> strange. Could you try to launch a clean Kafka cluster with
>>>> auto.leader.election disabled and try push data?
>>>> When leader migration occurs, NotLeaderForPartition exception is
>>>> expected.
>>>> 
>>>> Jiangjie (Becket) Qin
>>>> 
>>>> 
>>>> On 3/6/15, 3:14 PM, "Zakee" <kzak...@netzero.net> wrote:
>>>> 
>>>>> Yes, Jiangjie, I do see lots of these errors "Starting preferred
>>>>> replica
>>>>> leader election for partitions” in logs. I also see lot of Produce
>>>>> request failure warnings in with the NotLeader Exception.
>>>>> 
>>>>> I tried switching off the auto.leader.relabalance to false. I am still
>>>>> noticing the rebalance happening. My understanding was the rebalance
>>>>> will
>>>>> not happen when this is set to false.
>>>>> 
>>>>> Thanks
>>>>> Zakee
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Feb 25, 2015, at 5:17 PM, Jiangjie Qin <j...@linkedin.com.INVALID>
>>>>>> wrote:
>>>>>> 
>>>>>> I don’t think num.replica.fetchers will help in this case. Increasing
>>>>>> number of fetcher threads will only help in cases where you have a
>>>>>> large
>>>>>> amount of data coming into a broker and more replica fetcher threads
>>>>>> will
>>>>>> help keep up. We usually only use 1-2 for each broker. But in your
>>>>>> case,
>>>>>> it looks that leader migration cause issue.
>>>>>> Do you see anything else in the log? Like preferred leader election?
>>>>>> 
>>>>>> Jiangjie (Becket) Qin
>>>>>> 
>>>>>> On 2/25/15, 5:02 PM, "Zakee" <kzak...@netzero.net
>>>>>> <mailto:kzak...@netzero.net>> wrote:
>>>>>> 
>>>>>>> Thanks, Jiangjie.
>>>>>>> 
>>>>>>> Yes, I do see under partitions usually shooting every hour. Anythings
>>>>>>> that
>>>>>>> I could try to reduce it?
>>>>>>> 
>>>>>>> How does "num.replica.fetchers" affect the replica sync? Currently
>>>>>>> have
>>>>>>> configured 7 each of 5 brokers.
>>>>>>> 
>>>>>>> -Zakee
>>>>>>> 
>>>>>>> On Wed, Feb 25, 2015 at 4:17 PM, Jiangjie Qin
>>>>>>> <j...@linkedin.com.invalid>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> These messages are usually caused by leader migration. I think as
>>>>>>>> long
>>>>>>>> as
>>>>>>>> you don¹t see this lasting for ever and got a bunch of under
>>>>>>>> replicated
>>>>>>>> partitions, it should be fine.
>>>>>>>> 
>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>> 
>>>>>>>> On 2/25/15, 4:07 PM, "Zakee" <kzak...@netzero.net> wrote:
>>>>>>>> 
>>>>>>>>> Need to know if I should I be worried about this or ignore them.
>>>>>>>>> 
>>>>>>>>> I see tons of these exceptions/warnings in the broker logs, not
>>>>>>>>> sure
>>>>>>>> what
>>>>>>>>> causes them and what could be done to fix them.
>>>>>>>>> 
>>>>>>>>> ERROR [ReplicaFetcherThread-3-5], Error for partition [TestTopic]
>>>>>>>>> to
>>>>>>>>> broker
>>>>>>>>> 5:class kafka.common.NotLeaderForPartitionException
>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>> [2015-02-25 11:01:41,785] ERROR [ReplicaFetcherThread-3-5], Error
>>>>>>>>> for
>>>>>>>>> partition [TestTopic] to broker 5:class
>>>>>>>>> kafka.common.NotLeaderForPartitionException
>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>> [2015-02-25 11:01:41,785] WARN [Replica Manager on Broker 2]: Fetch
>>>>>>>>> request
>>>>>>>>> with correlation id 950084 from client ReplicaFetcherThread-1-2 on
>>>>>>>>> partition [TestTopic,2] failed due to Leader not local for
>>>>>>>>> partition
>>>>>>>>> [TestTopic,2] on broker 2 (kafka.server.ReplicaManager)
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Any ideas?
>>>>>>>>> 
>>>>>>>>> -Zakee
>>>>>>>>> ____________________________________________________________
>>>>>>>>> Next Apple Sensation
>>>>>>>>> 1 little-known path to big profits
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061st0
>>>>>>>>> 3v
>>>>>>>>> uc
>>>>>>>> 
>>>>>>>> ____________________________________________________________
>>>>>>>> Extended Stay America
>>>>>>>> Get Fantastic Amenities, low rates! Kitchen, Ample Workspace, Free
>>>>>>>> WIFI
>>>>>>>> 
>>>>>>>> 
>>>>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4mp02
>>>>>>>> du
>>>>>>>> c
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> ____________________________________________________________
>>>>>> Extended Stay America
>>>>>> Official Site. Free WIFI, Kitchens. Our best rates here, guaranteed.
>>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13duc
>>>>>> 
>>>>>> <http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13duc
>>>>>>> 
>>>> 
>>>> 
>>>> ____________________________________________________________
>>>> The WORST exercise for aging
>>>> Avoid this &#34;healthy&#34; exercise to look & feel 5-10 years YOUNGER
>>>> http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07duc
>>> 
>> 
>> 
>> ____________________________________________________________
>> Seabourn Luxury Cruises
>> Receive special offers from the World&#39;s Finest Small-Ship Cruise Line!
>> http://thirdpartyoffers.netzero.net/TGL3255/54fbf3b0f058073b02901mp14duc
>

Re: Broker Exceptions

Reply via email to