Re: Broker Exceptions

gharatmayuresh15 Sat, 14 Mar 2015 16:24:06 -0700

Is your topic log compacted? Also if it is are the messages keyed? Or are the 
messages compressed?


Thanks,

Mayuresh

Sent from my iPhone

> On Mar 14, 2015, at 2:02 PM, Zakee <kzak...@netzero.net> wrote:
> 
> Thanks, Jiangjie for helping resolve the kafka controller migration driven 
> partition leader rebalance issue. The logs are much cleaner now. 
> 
> There are a few incidences of Out of range offset even though  there is no 
> consumers running, only producers and replica fetchers. I was trying to 
> relate to a cause, looks like compaction (log segment deletion) causing this. 
> Not sure whether this is expected behavior.
> 
> Broker-4:
> [2015-03-14 07:46:52,338] ERROR [Replica Manager on Broker 4]: Error when 
> processing fetch request for partition [Topic22kv,5] offset 1754769769 from 
> follower with correlation id 1645671. Possible cause: Request for offset 
> 1754769769 but we only have log segments in the range 1400864851 to 
> 1754769732. (kafka.server.ReplicaManager)
> 
> Broker-3:
> [2015-03-14 07:46:52,356] INFO The cleaning for partition [Topic22kv,5] is 
> aborted and paused (kafka.log.LogCleaner)
> [2015-03-14 07:46:52,408] INFO Scheduling log segment 1400864851 for log 
> Topic22kv-5 for deletion. (kafka.log.Log)
> …
> [2015-03-14 07:46:52,421] INFO Compaction for partition [Topic22kv,5] is 
> resumed (kafka.log.LogCleaner)
> [2015-03-14 07:46:52,517] ERROR [ReplicaFetcherThread-2-4], Current offset 
> 1754769769 for partition [Topic22kv,5] out of range; reset offset to 
> 1400864851 (kafka.server.ReplicaFetcherThread)
> [2015-03-14 07:46:52,517] WARN [ReplicaFetcherThread-2-4], Replica 3 for 
> partition [Topic22kv,5] reset its fetch offset from 1400864851 to current 
> leader 4's start offset 1400864851 (kafka.server.ReplicaFetcherThread)
> 
> ____________________________________________________________
> Old School Yearbook Pics
> View Class Yearbooks Online Free. Search by School & Year. Look Now!
> http://thirdpartyoffers.netzero.net/TGL3231/5504a2032e49422021991st02vuc
> <topic22kv_746a_314_logs.txt>
> 
> 
> Thanks
> Zakee
> 
>> On Mar 9, 2015, at 12:18 PM, Zakee <kzak...@netzero.net> wrote:
>> 
>> No broker restarts.
>> 
>> Created a kafka issue: https://issues.apache.org/jira/browse/KAFKA-2011 
>> <https://issues.apache.org/jira/browse/KAFKA-2011>
>> 
>>>> Logs for rebalance:
>>>> [2015-03-07 16:52:48,969] INFO [Controller 2]: Resuming preferred replica 
>>>> election for partitions:  (kafka.controller.KafkaController)
>>>> [2015-03-07 16:52:48,969] INFO [Controller 2]: Partitions that completed 
>>>> preferred replica election:  (kafka.controller.KafkaController)
>>>> …
>>>> [2015-03-07 12:07:06,783] INFO [Controller 4]: Resuming preferred replica 
>>>> election for partitions:  (kafka.controller.KafkaController)
>>>> ...
>>>> [2015-03-07 09:10:41,850] INFO [Controller 3]: Resuming preferred replica 
>>>> election for partitions:  (kafka.controller.KafkaController)
>>>> ...
>>>> [2015-03-07 08:26:56,396] INFO [Controller 1]: Starting preferred replica 
>>>> leader election for partitions  (kafka.controller.KafkaController)
>>>> ...
>>>> [2015-03-06 16:52:59,506] INFO [Controller 2]: Partitions undergoing 
>>>> preferred replica election:  (kafka.controller.KafkaController)
>>>> 
>>>> Also, I still see lots of below errors (~69k) going on in the logs since 
>>>> the restart. Is there any other reason than rebalance for these errors?
>>>> 
>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for 
>>>> partition [Topic-11,7] to broker 5:class 
>>>> kafka.common.NotLeaderForPartitionException 
>>>> (kafka.server.ReplicaFetcherThread)
>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for 
>>>> partition [Topic-2,25] to broker 5:class 
>>>> kafka.common.NotLeaderForPartitionException 
>>>> (kafka.server.ReplicaFetcherThread)
>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for 
>>>> partition [Topic-2,21] to broker 5:class 
>>>> kafka.common.NotLeaderForPartitionException 
>>>> (kafka.server.ReplicaFetcherThread)
>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for 
>>>> partition [Topic-22,9] to broker 5:class 
>>>> kafka.common.NotLeaderForPartitionException 
>>>> (kafka.server.ReplicaFetcherThread)
>> 
>> 
>>> Could you paste the related logs in controller.log?
>> What specifically should I search for in the logs?
>> 
>> Thanks,
>> Zakee
>> 
>> 
>> 
>>> On Mar 9, 2015, at 11:35 AM, Jiangjie Qin <j...@linkedin.com.INVALID 
>>> <mailto:j...@linkedin.com.INVALID>> wrote:
>>> 
>>> Is there anything wrong with brokers around that time? E.g. Broker restart?
>>> The log you pasted are actually from replica fetchers. Could you paste the
>>> related logs in controller.log?
>>> 
>>> Thanks.
>>> 
>>> Jiangjie (Becket) Qin
>>> 
>>>> On 3/9/15, 10:32 AM, "Zakee" <kzak...@netzero.net 
>>>> <mailto:kzak...@netzero.net>> wrote:
>>>> 
>>>> Correction: Actually  the rebalance happened quite until 24 hours after
>>>> the start, and thats where below errors were found. Ideally rebalance
>>>> should not have happened at all.
>>>> 
>>>> 
>>>> Thanks
>>>> Zakee
>>>> 
>>>> 
>>>> 
>>>>>> On Mar 9, 2015, at 10:28 AM, Zakee <kzak...@netzero.net 
>>>>>> <mailto:kzak...@netzero.net>> wrote:
>>>>>> 
>>>>>> Hmm, that sounds like a bug. Can you paste the log of leader rebalance
>>>>>> here?
>>>>> Thanks for you suggestions.
>>>>> It looks like the rebalance actually happened only once soon after I
>>>>> started with clean cluster and data was pushed, it didn’t happen again
>>>>> so far, and I see the partitions leader counts on brokers did not change
>>>>> since then. One of the brokers was constantly showing 0 for partition
>>>>> leader count. Is that normal?
>>>>> 
>>>>> Also, I still see lots of below errors (~69k) going on in the logs
>>>>> since the restart. Is there any other reason than rebalance for these
>>>>> errors?
>>>>> 
>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for
>>>>> partition [Topic-11,7] to broker 5:class
>>>>> kafka.common.NotLeaderForPartitionException
>>>>> (kafka.server.ReplicaFetcherThread)
>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for
>>>>> partition [Topic-2,25] to broker 5:class
>>>>> kafka.common.NotLeaderForPartitionException
>>>>> (kafka.server.ReplicaFetcherThread)
>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for
>>>>> partition [Topic-2,21] to broker 5:class
>>>>> kafka.common.NotLeaderForPartitionException
>>>>> (kafka.server.ReplicaFetcherThread)
>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for
>>>>> partition [Topic-22,9] to broker 5:class
>>>>> kafka.common.NotLeaderForPartitionException
>>>>> (kafka.server.ReplicaFetcherThread)
>>>>> 
>>>>>> Some other things to check are:
>>>>>> 1. The actual property name is auto.leader.rebalance.enable, not
>>>>>> auto.leader.rebalance. You’ve probably known this, just to double
>>>>>> confirm.
>>>>> Yes 
>>>>> 
>>>>>> 2. In zookeeper path, can you verify /admin/preferred_replica_election
>>>>>> does not exist?
>>>>> ls /admin
>>>>> [delete_topics]
>>>>> ls /admin/preferred_replica_election
>>>>> Node does not exist: /admin/preferred_replica_election
>>>>> 
>>>>> 
>>>>> Thanks
>>>>> Zakee
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Mar 7, 2015, at 10:49 PM, Jiangjie Qin <j...@linkedin.com.INVALID 
>>>>>> <mailto:j...@linkedin.com.INVALID>>
>>>>>> wrote:
>>>>>> 
>>>>>> Hmm, that sounds like a bug. Can you paste the log of leader rebalance
>>>>>> here?
>>>>>> Some other things to check are:
>>>>>> 1. The actual property name is auto.leader.rebalance.enable, not
>>>>>> auto.leader.rebalance. You’ve probably known this, just to double
>>>>>> confirm.
>>>>>> 2. In zookeeper path, can you verify /admin/preferred_replica_election
>>>>>> does not exist?
>>>>>> 
>>>>>> Jiangjie (Becket) Qin
>>>>>> 
>>>>>>> On 3/7/15, 10:24 PM, "Zakee" <kzak...@netzero.net 
>>>>>>> <mailto:kzak...@netzero.net>> wrote:
>>>>>>> 
>>>>>>> I started with  clean cluster and started to push data. It still does
>>>>>>> the
>>>>>>> rebalance at random durations even though the auto.leader.relabalance
>>>>>>> is
>>>>>>> set to false.
>>>>>>> 
>>>>>>> Thanks
>>>>>>> Zakee
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On Mar 6, 2015, at 3:51 PM, Jiangjie Qin <j...@linkedin.com.INVALID 
>>>>>>>> <mailto:j...@linkedin.com.INVALID>>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Yes, the rebalance should not happen in that case. That is a little
>>>>>>>> bit
>>>>>>>> strange. Could you try to launch a clean Kafka cluster with
>>>>>>>> auto.leader.election disabled and try push data?
>>>>>>>> When leader migration occurs, NotLeaderForPartition exception is
>>>>>>>> expected.
>>>>>>>> 
>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On 3/6/15, 3:14 PM, "Zakee" <kzak...@netzero.net 
>>>>>>>>> <mailto:kzak...@netzero.net>> wrote:
>>>>>>>>> 
>>>>>>>>> Yes, Jiangjie, I do see lots of these errors "Starting preferred
>>>>>>>>> replica
>>>>>>>>> leader election for partitions” in logs. I also see lot of Produce
>>>>>>>>> request failure warnings in with the NotLeader Exception.
>>>>>>>>> 
>>>>>>>>> I tried switching off the auto.leader.relabalance to false. I am
>>>>>>>>> still
>>>>>>>>> noticing the rebalance happening. My understanding was the rebalance
>>>>>>>>> will
>>>>>>>>> not happen when this is set to false.
>>>>>>>>> 
>>>>>>>>> Thanks
>>>>>>>>> Zakee
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Feb 25, 2015, at 5:17 PM, Jiangjie Qin
>>>>>>>>>> <j...@linkedin.com.INVALID <mailto:j...@linkedin.com.INVALID>>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> I don’t think num.replica.fetchers will help in this case.
>>>>>>>>>> Increasing
>>>>>>>>>> number of fetcher threads will only help in cases where you have a
>>>>>>>>>> large
>>>>>>>>>> amount of data coming into a broker and more replica fetcher
>>>>>>>>>> threads
>>>>>>>>>> will
>>>>>>>>>> help keep up. We usually only use 1-2 for each broker. But in your
>>>>>>>>>> case,
>>>>>>>>>> it looks that leader migration cause issue.
>>>>>>>>>> Do you see anything else in the log? Like preferred leader
>>>>>>>>>> election?
>>>>>>>>>> 
>>>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>>>> 
>>>>>>>>>> On 2/25/15, 5:02 PM, "Zakee" <kzak...@netzero.net 
>>>>>>>>>> <mailto:kzak...@netzero.net>
>>>>>>>>>> <mailto:kzak...@netzero.net <mailto:kzak...@netzero.net>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Thanks, Jiangjie.
>>>>>>>>>>> 
>>>>>>>>>>> Yes, I do see under partitions usually shooting every hour.
>>>>>>>>>>> Anythings
>>>>>>>>>>> that
>>>>>>>>>>> I could try to reduce it?
>>>>>>>>>>> 
>>>>>>>>>>> How does "num.replica.fetchers" affect the replica sync? Currently
>>>>>>>>>>> have
>>>>>>>>>>> configured 7 each of 5 brokers.
>>>>>>>>>>> 
>>>>>>>>>>> -Zakee
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, Feb 25, 2015 at 4:17 PM, Jiangjie Qin
>>>>>>>>>>> <j...@linkedin.com.invalid <mailto:j...@linkedin.com.invalid>>
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> These messages are usually caused by leader migration. I think as
>>>>>>>>>>>> long
>>>>>>>>>>>> as
>>>>>>>>>>>> you don¹t see this lasting for ever and got a bunch of under
>>>>>>>>>>>> replicated
>>>>>>>>>>>> partitions, it should be fine.
>>>>>>>>>>>> 
>>>>>>>>>>>> Jiangjie (Becket) Qin
>>>>>>>>>>>> 
>>>>>>>>>>>>> On 2/25/15, 4:07 PM, "Zakee" <kzak...@netzero.net 
>>>>>>>>>>>>> <mailto:kzak...@netzero.net>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Need to know if I should I be worried about this or ignore them.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I see tons of these exceptions/warnings in the broker logs, not
>>>>>>>>>>>>> sure
>>>>>>>>>>>> what
>>>>>>>>>>>>> causes them and what could be done to fix them.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> ERROR [ReplicaFetcherThread-3-5], Error for partition
>>>>>>>>>>>>> [TestTopic]
>>>>>>>>>>>>> to
>>>>>>>>>>>>> broker
>>>>>>>>>>>>> 5:class kafka.common.NotLeaderForPartitionException
>>>>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>>>>>> [2015-02-25 11:01:41,785] ERROR [ReplicaFetcherThread-3-5],
>>>>>>>>>>>>> Error
>>>>>>>>>>>>> for
>>>>>>>>>>>>> partition [TestTopic] to broker 5:class
>>>>>>>>>>>>> kafka.common.NotLeaderForPartitionException
>>>>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
>>>>>>>>>>>>> [2015-02-25 11:01:41,785] WARN [Replica Manager on Broker 2]:
>>>>>>>>>>>>> Fetch
>>>>>>>>>>>>> request
>>>>>>>>>>>>> with correlation id 950084 from client ReplicaFetcherThread-1-2
>>>>>>>>>>>>> on
>>>>>>>>>>>>> partition [TestTopic,2] failed due to Leader not local for
>>>>>>>>>>>>> partition
>>>>>>>>>>>>> [TestTopic,2] on broker 2 (kafka.server.ReplicaManager)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Any ideas?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> -Zakee
>>>>>>>>>>>>> ____________________________________________________________
>>>>>>>>>>>>> Next Apple Sensation
>>>>>>>>>>>>> 1 little-known path to big profits
>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061 
>>>>>>>>>>>>> <http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061>
>>>>>>>>>>>>> st0
>>>>>>>>>>>>> 3v
>>>>>>>>>>>>> uc
>>>>>>>>>>>> 
>>>>>>>>>>>> ____________________________________________________________
>>>>>>>>>>>> Extended Stay America
>>>>>>>>>>>> Get Fantastic Amenities, low rates! Kitchen, Ample Workspace,
>>>>>>>>>>>> Free
>>>>>>>>>>>> WIFI
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4m 
>>>>>>>>>>>> <http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4m>
>>>>>>>>>>>> p02
>>>>>>>>>>>> du
>>>>>>>>>>>> c
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> ____________________________________________________________
>>>>>>>>>> Extended Stay America
>>>>>>>>>> Official Site. Free WIFI, Kitchens. Our best rates here,
>>>>>>>>>> guaranteed.
>>>>>>>>>> 
>>>>>>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13d 
>>>>>>>>>> <http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13d>
>>>>>>>>>> uc
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> <http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13
>>>>>>>>>> duc
>>>>>>>> 
>>>>>>>> 
>>>>>>>> ____________________________________________________________
>>>>>>>> The WORST exercise for aging
>>>>>>>> Avoid this &#34;healthy&#34; exercise to look & feel 5-10 years
>>>>>>>> YOUNGER
>>>>>>>> 
>>>>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07d 
>>>>>>>> <http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07d>
>>>>>>>> uc
>>>>>> 
>>>>>> 
>>>>>> ____________________________________________________________
>>>>>> Seabourn Luxury Cruises
>>>>>> Receive special offers from the World&#39;s Finest Small-Ship Cruise
>>>>>> Line!
>>>>>> 
>>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54fbf3b0f058073b02901mp14duc 
>>>>>> <http://thirdpartyoffers.netzero.net/TGL3255/54fbf3b0f058073b02901mp14duc>
>>> 
>>> 
>>> ____________________________________________________________
>>> Discover Seabourn
>>> A journey as beautiful as the destination, request a brochure today!
>>> http://thirdpartyoffers.netzero.net/TGL3255/54fdebfe6a2a36bfb0bb3mp10duc 
>>> <http://thirdpartyoffers.netzero.net/TGL3255/54fdebfe6a2a36bfb0bb3mp10duc>
>> 
>> 
>> Thanks
>> Zakee
>> 
>> 
>> 
>> ____________________________________________________________
>> Want to place your ad here?
>> Advertise on United Online
>> http://thirdpartyoffers.netzero.net/TGL3255/54fdf80bc575a780b0397mp05duc
>

Re: Broker Exceptions

Reply via email to