Re: Broker Exceptions

Mayuresh Gharat Mon, 16 Mar 2015 10:49:10 -0700

Can you provide more logs (complete) on Broker 3 till time :

*[2015-03-14 07:46:52,517*] WARN [ReplicaFetcherThread-2-4], Replica 3 for
partition [Topic22kv,5] reset its fetch offset from 1400864851 to current
leader 4's start offset 1400864851 (kafka.server.ReplicaFetcherThread)


I would like to see logs from time much before it sent the fetch request to
Broker 4 to the time above. I want to check if in any case Broker 3 was a
leader before broker 4 took over.

Additional logs will help.


Thanks,

Mayuresh



On Sat, Mar 14, 2015 at 8:35 PM, Zakee <kzak...@netzero.net> wrote:

> log.cleanup.policy is delete not compact.
> log.cleaner.enable=true
> log.cleaner.threads=5
> log.cleanup.policy=delete
> log.flush.scheduler.interval.ms=3000
> log.retention.minutes=1440
> log.segment.bytes=1073741824  (1gb)
>
> Messages are keyed but not compressed, producer async and uses kafka
> default partitioner.
> String message = msg.getString();
> String uniqKey = ""+rnd.nextInt();// random key
> String partKey = getPartitionKey();// partition key
> KeyedMessage<String, String> data = new KeyedMessage<String,
> String>(this.topicName, uniqKey, partKey, message);
> producer.send(data);
>
> Thanks
> Zakee
>
>
>
> > On Mar 14, 2015, at 4:23 PM, gharatmayures...@gmail.com wrote:
> >
> > Is your topic log compacted? Also if it is are the messages keyed? Or
> are the messages compressed?
> >
> > Thanks,
> >
> > Mayuresh
> >
> > Sent from my iPhone
> >
> >> On Mar 14, 2015, at 2:02 PM, Zakee <kzak...@netzero.net <mailto:
> kzak...@netzero.net>> wrote:
> >>
> >> Thanks, Jiangjie for helping resolve the kafka controller migration
> driven partition leader rebalance issue. The logs are much cleaner now.
> >>
> >> There are a few incidences of Out of range offset even though  there is
> no consumers running, only producers and replica fetchers. I was trying to
> relate to a cause, looks like compaction (log segment deletion) causing
> this. Not sure whether this is expected behavior.
> >>
> >> Broker-4:
> >> [2015-03-14 07:46:52,338] ERROR [Replica Manager on Broker 4]: Error
> when processing fetch request for partition [Topic22kv,5] offset 1754769769
> from follower with correlation id 1645671. Possible cause: Request for
> offset 1754769769 but we only have log segments in the range 1400864851 to
> 1754769732. (kafka.server.ReplicaManager)
> >>
> >> Broker-3:
> >> [2015-03-14 07:46:52,356] INFO The cleaning for partition [Topic22kv,5]
> is aborted and paused (kafka.log.LogCleaner)
> >> [2015-03-14 07:46:52,408] INFO Scheduling log segment 1400864851 for
> log Topic22kv-5 for deletion. (kafka.log.Log)
> >> …
> >> [2015-03-14 07:46:52,421] INFO Compaction for partition [Topic22kv,5]
> is resumed (kafka.log.LogCleaner)
> >> [2015-03-14 07:46:52,517] ERROR [ReplicaFetcherThread-2-4], Current
> offset 1754769769 for partition [Topic22kv,5] out of range; reset offset to
> 1400864851 (kafka.server.ReplicaFetcherThread)
> >> [2015-03-14 07:46:52,517] WARN [ReplicaFetcherThread-2-4], Replica 3
> for partition [Topic22kv,5] reset its fetch offset from 1400864851 to
> current leader 4's start offset 1400864851
> (kafka.server.ReplicaFetcherThread)
> >>
> >> ____________________________________________________________
> >> Old School Yearbook Pics
> >> View Class Yearbooks Online Free. Search by School & Year. Look Now!
> >>
> http://thirdpartyoffers.netzero.net/TGL3231/5504a2032e49422021991st02vuc <
> http://thirdpartyoffers.netzero.net/TGL3231/5504a2032e49422021991st02vuc>
> >> <topic22kv_746a_314_logs.txt>
> >>
> >>
> >> Thanks
> >> Zakee
> >>
> >>> On Mar 9, 2015, at 12:18 PM, Zakee <kzak...@netzero.net> wrote:
> >>>
> >>> No broker restarts.
> >>>
> >>> Created a kafka issue:
> https://issues.apache.org/jira/browse/KAFKA-2011 <
> https://issues.apache.org/jira/browse/KAFKA-2011>
> >>>
> >>>>> Logs for rebalance:
> >>>>> [2015-03-07 16:52:48,969] INFO [Controller 2]: Resuming preferred
> replica election for partitions: (kafka.controller.KafkaController)
> >>>>> [2015-03-07 16:52:48,969] INFO [Controller 2]: Partitions that
> completed preferred replica election: (kafka.controller.KafkaController)
> >>>>> …
> >>>>> [2015-03-07 12:07:06,783] INFO [Controller 4]: Resuming preferred
> replica election for partitions: (kafka.controller.KafkaController)
> >>>>> ...
> >>>>> [2015-03-07 09:10:41,850] INFO [Controller 3]: Resuming preferred
> replica election for partitions: (kafka.controller.KafkaController)
> >>>>> ...
> >>>>> [2015-03-07 08:26:56,396] INFO [Controller 1]: Starting preferred
> replica leader election for partitions (kafka.controller.KafkaController)
> >>>>> ...
> >>>>> [2015-03-06 16:52:59,506] INFO [Controller 2]: Partitions undergoing
> preferred replica election:  (kafka.controller.KafkaController)
> >>>>>
> >>>>> Also, I still see lots of below errors (~69k) going on in the logs
> since the restart. Is there any other reason than rebalance for these
> errors?
> >>>>>
> >>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error
> for partition [Topic-11,7] to broker 5:class
> kafka.common.NotLeaderForPartitionException
> (kafka.server.ReplicaFetcherThread)
> >>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error
> for partition [Topic-2,25] to broker 5:class
> kafka.common.NotLeaderForPartitionException
> (kafka.server.ReplicaFetcherThread)
> >>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error
> for partition [Topic-2,21] to broker 5:class
> kafka.common.NotLeaderForPartitionException
> (kafka.server.ReplicaFetcherThread)
> >>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error
> for partition [Topic-22,9] to broker 5:class
> kafka.common.NotLeaderForPartitionException
> (kafka.server.ReplicaFetcherThread)
> >>>
> >>>
> >>>> Could you paste the related logs in controller.log?
> >>> What specifically should I search for in the logs?
> >>>
> >>> Thanks,
> >>> Zakee
> >>>
> >>>
> >>>
> >>>> On Mar 9, 2015, at 11:35 AM, Jiangjie Qin <j...@linkedin.com.INVALID
> <mailto:j...@linkedin.com.INVALID>> wrote:
> >>>>
> >>>> Is there anything wrong with brokers around that time? E.g. Broker
> restart?
> >>>> The log you pasted are actually from replica fetchers. Could you
> paste the
> >>>> related logs in controller.log?
> >>>>
> >>>> Thanks.
> >>>>
> >>>> Jiangjie (Becket) Qin
> >>>>
> >>>>> On 3/9/15, 10:32 AM, "Zakee" <kzak...@netzero.net <mailto:
> kzak...@netzero.net>> wrote:
> >>>>>
> >>>>> Correction: Actually  the rebalance happened quite until 24 hours
> after
> >>>>> the start, and thats where below errors were found. Ideally rebalance
> >>>>> should not have happened at all.
> >>>>>
> >>>>>
> >>>>> Thanks
> >>>>> Zakee
> >>>>>
> >>>>>
> >>>>>
> >>>>>>> On Mar 9, 2015, at 10:28 AM, Zakee <kzak...@netzero.net <mailto:
> kzak...@netzero.net>> wrote:
> >>>>>>>
> >>>>>>> Hmm, that sounds like a bug. Can you paste the log of leader
> rebalance
> >>>>>>> here?
> >>>>>> Thanks for you suggestions.
> >>>>>> It looks like the rebalance actually happened only once soon after I
> >>>>>> started with clean cluster and data was pushed, it didn’t happen
> again
> >>>>>> so far, and I see the partitions leader counts on brokers did not
> change
> >>>>>> since then. One of the brokers was constantly showing 0 for
> partition
> >>>>>> leader count. Is that normal?
> >>>>>>
> >>>>>> Also, I still see lots of below errors (~69k) going on in the logs
> >>>>>> since the restart. Is there any other reason than rebalance for
> these
> >>>>>> errors?
> >>>>>>
> >>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error
> for
> >>>>>> partition [Topic-11,7] to broker 5:class
> >>>>>> kafka.common.NotLeaderForPartitionException
> >>>>>> (kafka.server.ReplicaFetcherThread)
> >>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error
> for
> >>>>>> partition [Topic-2,25] to broker 5:class
> >>>>>> kafka.common.NotLeaderForPartitionException
> >>>>>> (kafka.server.ReplicaFetcherThread)
> >>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error
> for
> >>>>>> partition [Topic-2,21] to broker 5:class
> >>>>>> kafka.common.NotLeaderForPartitionException
> >>>>>> (kafka.server.ReplicaFetcherThread)
> >>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error
> for
> >>>>>> partition [Topic-22,9] to broker 5:class
> >>>>>> kafka.common.NotLeaderForPartitionException
> >>>>>> (kafka.server.ReplicaFetcherThread)
> >>>>>>
> >>>>>>> Some other things to check are:
> >>>>>>> 1. The actual property name is auto.leader.rebalance.enable, not
> >>>>>>> auto.leader.rebalance. You’ve probably known this, just to double
> >>>>>>> confirm.
> >>>>>> Yes
> >>>>>>
> >>>>>>> 2. In zookeeper path, can you verify
> /admin/preferred_replica_election
> >>>>>>> does not exist?
> >>>>>> ls /admin
> >>>>>> [delete_topics]
> >>>>>> ls /admin/preferred_replica_election
> >>>>>> Node does not exist: /admin/preferred_replica_election
> >>>>>>
> >>>>>>
> >>>>>> Thanks
> >>>>>> Zakee
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> On Mar 7, 2015, at 10:49 PM, Jiangjie Qin
> <j...@linkedin.com.INVALID <mailto:j...@linkedin.com.INVALID>>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>> Hmm, that sounds like a bug. Can you paste the log of leader
> rebalance
> >>>>>>> here?
> >>>>>>> Some other things to check are:
> >>>>>>> 1. The actual property name is auto.leader.rebalance.enable, not
> >>>>>>> auto.leader.rebalance. You’ve probably known this, just to double
> >>>>>>> confirm.
> >>>>>>> 2. In zookeeper path, can you verify
> /admin/preferred_replica_election
> >>>>>>> does not exist?
> >>>>>>>
> >>>>>>> Jiangjie (Becket) Qin
> >>>>>>>
> >>>>>>>> On 3/7/15, 10:24 PM, "Zakee" <kzak...@netzero.net <mailto:
> kzak...@netzero.net>> wrote:
> >>>>>>>>
> >>>>>>>> I started with  clean cluster and started to push data. It still
> does
> >>>>>>>> the
> >>>>>>>> rebalance at random durations even though the
> auto.leader.relabalance
> >>>>>>>> is
> >>>>>>>> set to false.
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>>> Zakee
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> On Mar 6, 2015, at 3:51 PM, Jiangjie Qin
> <j...@linkedin.com.INVALID <mailto:j...@linkedin.com.INVALID>>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> Yes, the rebalance should not happen in that case. That is a
> little
> >>>>>>>>> bit
> >>>>>>>>> strange. Could you try to launch a clean Kafka cluster with
> >>>>>>>>> auto.leader.election disabled and try push data?
> >>>>>>>>> When leader migration occurs, NotLeaderForPartition exception is
> >>>>>>>>> expected.
> >>>>>>>>>
> >>>>>>>>> Jiangjie (Becket) Qin
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> On 3/6/15, 3:14 PM, "Zakee" <kzak...@netzero.net <mailto:
> kzak...@netzero.net>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Yes, Jiangjie, I do see lots of these errors "Starting preferred
> >>>>>>>>>> replica
> >>>>>>>>>> leader election for partitions” in logs. I also see lot of
> Produce
> >>>>>>>>>> request failure warnings in with the NotLeader Exception.
> >>>>>>>>>>
> >>>>>>>>>> I tried switching off the auto.leader.relabalance to false. I am
> >>>>>>>>>> still
> >>>>>>>>>> noticing the rebalance happening. My understanding was the
> rebalance
> >>>>>>>>>> will
> >>>>>>>>>> not happen when this is set to false.
> >>>>>>>>>>
> >>>>>>>>>> Thanks
> >>>>>>>>>> Zakee
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> On Feb 25, 2015, at 5:17 PM, Jiangjie Qin
> >>>>>>>>>>> <j...@linkedin.com.INVALID <mailto:j...@linkedin.com.INVALID>>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> I don’t think num.replica.fetchers will help in this case.
> >>>>>>>>>>> Increasing
> >>>>>>>>>>> number of fetcher threads will only help in cases where you
> have a
> >>>>>>>>>>> large
> >>>>>>>>>>> amount of data coming into a broker and more replica fetcher
> >>>>>>>>>>> threads
> >>>>>>>>>>> will
> >>>>>>>>>>> help keep up. We usually only use 1-2 for each broker. But in
> your
> >>>>>>>>>>> case,
> >>>>>>>>>>> it looks that leader migration cause issue.
> >>>>>>>>>>> Do you see anything else in the log? Like preferred leader
> >>>>>>>>>>> election?
> >>>>>>>>>>>
> >>>>>>>>>>> Jiangjie (Becket) Qin
> >>>>>>>>>>>
> >>>>>>>>>>> On 2/25/15, 5:02 PM, "Zakee" <kzak...@netzero.net <mailto:
> kzak...@netzero.net>
> >>>>>>>>>>> <mailto:kzak...@netzero.net <mailto:kzak...@netzero.net>>>
> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Thanks, Jiangjie.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Yes, I do see under partitions usually shooting every hour.
> >>>>>>>>>>>> Anythings
> >>>>>>>>>>>> that
> >>>>>>>>>>>> I could try to reduce it?
> >>>>>>>>>>>>
> >>>>>>>>>>>> How does "num.replica.fetchers" affect the replica sync?
> Currently
> >>>>>>>>>>>> have
> >>>>>>>>>>>> configured 7 each of 5 brokers.
> >>>>>>>>>>>>
> >>>>>>>>>>>> -Zakee
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Wed, Feb 25, 2015 at 4:17 PM, Jiangjie Qin
> >>>>>>>>>>>> <j...@linkedin.com.invalid <mailto:j...@linkedin.com.invalid
> >>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> These messages are usually caused by leader migration. I
> think as
> >>>>>>>>>>>>> long
> >>>>>>>>>>>>> as
> >>>>>>>>>>>>> you don¹t see this lasting for ever and got a bunch of under
> >>>>>>>>>>>>> replicated
> >>>>>>>>>>>>> partitions, it should be fine.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Jiangjie (Becket) Qin
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 2/25/15, 4:07 PM, "Zakee" <kzak...@netzero.net <mailto:
> kzak...@netzero.net>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Need to know if I should I be worried about this or ignore
> them.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I see tons of these exceptions/warnings in the broker logs,
> not
> >>>>>>>>>>>>>> sure
> >>>>>>>>>>>>> what
> >>>>>>>>>>>>>> causes them and what could be done to fix them.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> ERROR [ReplicaFetcherThread-3-5], Error for partition
> >>>>>>>>>>>>>> [TestTopic]
> >>>>>>>>>>>>>> to
> >>>>>>>>>>>>>> broker
> >>>>>>>>>>>>>> 5:class kafka.common.NotLeaderForPartitionException
> >>>>>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
> >>>>>>>>>>>>>> [2015-02-25 11:01:41,785] ERROR [ReplicaFetcherThread-3-5],
> >>>>>>>>>>>>>> Error
> >>>>>>>>>>>>>> for
> >>>>>>>>>>>>>> partition [TestTopic] to broker 5:class
> >>>>>>>>>>>>>> kafka.common.NotLeaderForPartitionException
> >>>>>>>>>>>>>> (kafka.server.ReplicaFetcherThread)
> >>>>>>>>>>>>>> [2015-02-25 11:01:41,785] WARN [Replica Manager on Broker
> 2]:
> >>>>>>>>>>>>>> Fetch
> >>>>>>>>>>>>>> request
> >>>>>>>>>>>>>> with correlation id 950084 from client
> ReplicaFetcherThread-1-2
> >>>>>>>>>>>>>> on
> >>>>>>>>>>>>>> partition [TestTopic,2] failed due to Leader not local for
> >>>>>>>>>>>>>> partition
> >>>>>>>>>>>>>> [TestTopic,2] on broker 2 (kafka.server.ReplicaManager)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Any ideas?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> -Zakee
> >>>>>>>>>>>>>> ____________________________________________________________
> >>>>>>>>>>>>>> Next Apple Sensation
> >>>>>>>>>>>>>> 1 little-known path to big profits
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061 <
> http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061>
> >>>>>>>>>>>>>> st0
> >>>>>>>>>>>>>> 3v
> >>>>>>>>>>>>>> uc
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> ____________________________________________________________
> >>>>>>>>>>>>> Extended Stay America
> >>>>>>>>>>>>> Get Fantastic Amenities, low rates! Kitchen, Ample Workspace,
> >>>>>>>>>>>>> Free
> >>>>>>>>>>>>> WIFI
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4m <
> http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4m>
> >>>>>>>>>>>>> p02
> >>>>>>>>>>>>> du
> >>>>>>>>>>>>> c
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> ____________________________________________________________
> >>>>>>>>>>> Extended Stay America
> >>>>>>>>>>> Official Site. Free WIFI, Kitchens. Our best rates here,
> >>>>>>>>>>> guaranteed.
> >>>>>>>>>>>
> >>>>>>>>>>>
> http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13d <
> http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13d>
> >>>>>>>>>>> uc
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> <
> http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13
> >>>>>>>>>>> duc
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> ____________________________________________________________
> >>>>>>>>> The WORST exercise for aging
> >>>>>>>>> Avoid this &#34;healthy&#34; exercise to look & feel 5-10 years
> >>>>>>>>> YOUNGER
> >>>>>>>>>
> >>>>>>>>>
> http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07d <
> http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07d>
> >>>>>>>>> uc
> >>>>>>>
> >>>>>>>
> >>>>>>> ____________________________________________________________
> >>>>>>> Seabourn Luxury Cruises
> >>>>>>> Receive special offers from the World&#39;s Finest Small-Ship
> Cruise
> >>>>>>> Line!
> >>>>>>>
> >>>>>>>
> http://thirdpartyoffers.netzero.net/TGL3255/54fbf3b0f058073b02901mp14duc <
> http://thirdpartyoffers.netzero.net/TGL3255/54fbf3b0f058073b02901mp14duc>
> >>>>
> >>>>
> >>>> ____________________________________________________________
> >>>> Discover Seabourn
> >>>> A journey as beautiful as the destination, request a brochure today!
> >>>>
> http://thirdpartyoffers.netzero.net/TGL3255/54fdebfe6a2a36bfb0bb3mp10duc <
> http://thirdpartyoffers.netzero.net/TGL3255/54fdebfe6a2a36bfb0bb3mp10duc>
> >>>
> >>>
> >>> Thanks
> >>> Zakee
> >>>
> >>>
> >>>
> >>> ____________________________________________________________
> >>> Want to place your ad here?
> >>> Advertise on United Online
> >>>
> http://thirdpartyoffers.netzero.net/TGL3255/54fdf80bc575a780b0397mp05duc
> >>
> > ____________________________________________________________
> > What's your flood risk?
> > Find flood maps, interactive tools, FAQs, and agents in your area.
> > http://thirdpartyoffers.netzero.net/TGL3255/5504cccfca43a4ccf0a56mp08duc
> <http://thirdpartyoffers.netzero.net/TGL3255/5504cccfca43a4ccf0a56mp08duc>
>



-- 
-Regards,
Mayuresh R. Gharat
(862) 250-7125

Re: Broker Exceptions

Reply via email to