Is your topic log compacted? Also if it is are the messages keyed? Or are the messages compressed?
Thanks, Mayuresh Sent from my iPhone > On Mar 14, 2015, at 2:02 PM, Zakee <kzak...@netzero.net> wrote: > > Thanks, Jiangjie for helping resolve the kafka controller migration driven > partition leader rebalance issue. The logs are much cleaner now. > > There are a few incidences of Out of range offset even though there is no > consumers running, only producers and replica fetchers. I was trying to > relate to a cause, looks like compaction (log segment deletion) causing this. > Not sure whether this is expected behavior. > > Broker-4: > [2015-03-14 07:46:52,338] ERROR [Replica Manager on Broker 4]: Error when > processing fetch request for partition [Topic22kv,5] offset 1754769769 from > follower with correlation id 1645671. Possible cause: Request for offset > 1754769769 but we only have log segments in the range 1400864851 to > 1754769732. (kafka.server.ReplicaManager) > > Broker-3: > [2015-03-14 07:46:52,356] INFO The cleaning for partition [Topic22kv,5] is > aborted and paused (kafka.log.LogCleaner) > [2015-03-14 07:46:52,408] INFO Scheduling log segment 1400864851 for log > Topic22kv-5 for deletion. (kafka.log.Log) > … > [2015-03-14 07:46:52,421] INFO Compaction for partition [Topic22kv,5] is > resumed (kafka.log.LogCleaner) > [2015-03-14 07:46:52,517] ERROR [ReplicaFetcherThread-2-4], Current offset > 1754769769 for partition [Topic22kv,5] out of range; reset offset to > 1400864851 (kafka.server.ReplicaFetcherThread) > [2015-03-14 07:46:52,517] WARN [ReplicaFetcherThread-2-4], Replica 3 for > partition [Topic22kv,5] reset its fetch offset from 1400864851 to current > leader 4's start offset 1400864851 (kafka.server.ReplicaFetcherThread) > > ____________________________________________________________ > Old School Yearbook Pics > View Class Yearbooks Online Free. Search by School & Year. Look Now! > http://thirdpartyoffers.netzero.net/TGL3231/5504a2032e49422021991st02vuc > <topic22kv_746a_314_logs.txt> > > > Thanks > Zakee > >> On Mar 9, 2015, at 12:18 PM, Zakee <kzak...@netzero.net> wrote: >> >> No broker restarts. >> >> Created a kafka issue: https://issues.apache.org/jira/browse/KAFKA-2011 >> <https://issues.apache.org/jira/browse/KAFKA-2011> >> >>>> Logs for rebalance: >>>> [2015-03-07 16:52:48,969] INFO [Controller 2]: Resuming preferred replica >>>> election for partitions: (kafka.controller.KafkaController) >>>> [2015-03-07 16:52:48,969] INFO [Controller 2]: Partitions that completed >>>> preferred replica election: (kafka.controller.KafkaController) >>>> … >>>> [2015-03-07 12:07:06,783] INFO [Controller 4]: Resuming preferred replica >>>> election for partitions: (kafka.controller.KafkaController) >>>> ... >>>> [2015-03-07 09:10:41,850] INFO [Controller 3]: Resuming preferred replica >>>> election for partitions: (kafka.controller.KafkaController) >>>> ... >>>> [2015-03-07 08:26:56,396] INFO [Controller 1]: Starting preferred replica >>>> leader election for partitions (kafka.controller.KafkaController) >>>> ... >>>> [2015-03-06 16:52:59,506] INFO [Controller 2]: Partitions undergoing >>>> preferred replica election: (kafka.controller.KafkaController) >>>> >>>> Also, I still see lots of below errors (~69k) going on in the logs since >>>> the restart. Is there any other reason than rebalance for these errors? >>>> >>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for >>>> partition [Topic-11,7] to broker 5:class >>>> kafka.common.NotLeaderForPartitionException >>>> (kafka.server.ReplicaFetcherThread) >>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for >>>> partition [Topic-2,25] to broker 5:class >>>> kafka.common.NotLeaderForPartitionException >>>> (kafka.server.ReplicaFetcherThread) >>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for >>>> partition [Topic-2,21] to broker 5:class >>>> kafka.common.NotLeaderForPartitionException >>>> (kafka.server.ReplicaFetcherThread) >>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for >>>> partition [Topic-22,9] to broker 5:class >>>> kafka.common.NotLeaderForPartitionException >>>> (kafka.server.ReplicaFetcherThread) >> >> >>> Could you paste the related logs in controller.log? >> What specifically should I search for in the logs? >> >> Thanks, >> Zakee >> >> >> >>> On Mar 9, 2015, at 11:35 AM, Jiangjie Qin <j...@linkedin.com.INVALID >>> <mailto:j...@linkedin.com.INVALID>> wrote: >>> >>> Is there anything wrong with brokers around that time? E.g. Broker restart? >>> The log you pasted are actually from replica fetchers. Could you paste the >>> related logs in controller.log? >>> >>> Thanks. >>> >>> Jiangjie (Becket) Qin >>> >>>> On 3/9/15, 10:32 AM, "Zakee" <kzak...@netzero.net >>>> <mailto:kzak...@netzero.net>> wrote: >>>> >>>> Correction: Actually the rebalance happened quite until 24 hours after >>>> the start, and thats where below errors were found. Ideally rebalance >>>> should not have happened at all. >>>> >>>> >>>> Thanks >>>> Zakee >>>> >>>> >>>> >>>>>> On Mar 9, 2015, at 10:28 AM, Zakee <kzak...@netzero.net >>>>>> <mailto:kzak...@netzero.net>> wrote: >>>>>> >>>>>> Hmm, that sounds like a bug. Can you paste the log of leader rebalance >>>>>> here? >>>>> Thanks for you suggestions. >>>>> It looks like the rebalance actually happened only once soon after I >>>>> started with clean cluster and data was pushed, it didn’t happen again >>>>> so far, and I see the partitions leader counts on brokers did not change >>>>> since then. One of the brokers was constantly showing 0 for partition >>>>> leader count. Is that normal? >>>>> >>>>> Also, I still see lots of below errors (~69k) going on in the logs >>>>> since the restart. Is there any other reason than rebalance for these >>>>> errors? >>>>> >>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for >>>>> partition [Topic-11,7] to broker 5:class >>>>> kafka.common.NotLeaderForPartitionException >>>>> (kafka.server.ReplicaFetcherThread) >>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for >>>>> partition [Topic-2,25] to broker 5:class >>>>> kafka.common.NotLeaderForPartitionException >>>>> (kafka.server.ReplicaFetcherThread) >>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error for >>>>> partition [Topic-2,21] to broker 5:class >>>>> kafka.common.NotLeaderForPartitionException >>>>> (kafka.server.ReplicaFetcherThread) >>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error for >>>>> partition [Topic-22,9] to broker 5:class >>>>> kafka.common.NotLeaderForPartitionException >>>>> (kafka.server.ReplicaFetcherThread) >>>>> >>>>>> Some other things to check are: >>>>>> 1. The actual property name is auto.leader.rebalance.enable, not >>>>>> auto.leader.rebalance. You’ve probably known this, just to double >>>>>> confirm. >>>>> Yes >>>>> >>>>>> 2. In zookeeper path, can you verify /admin/preferred_replica_election >>>>>> does not exist? >>>>> ls /admin >>>>> [delete_topics] >>>>> ls /admin/preferred_replica_election >>>>> Node does not exist: /admin/preferred_replica_election >>>>> >>>>> >>>>> Thanks >>>>> Zakee >>>>> >>>>> >>>>> >>>>>> On Mar 7, 2015, at 10:49 PM, Jiangjie Qin <j...@linkedin.com.INVALID >>>>>> <mailto:j...@linkedin.com.INVALID>> >>>>>> wrote: >>>>>> >>>>>> Hmm, that sounds like a bug. Can you paste the log of leader rebalance >>>>>> here? >>>>>> Some other things to check are: >>>>>> 1. The actual property name is auto.leader.rebalance.enable, not >>>>>> auto.leader.rebalance. You’ve probably known this, just to double >>>>>> confirm. >>>>>> 2. In zookeeper path, can you verify /admin/preferred_replica_election >>>>>> does not exist? >>>>>> >>>>>> Jiangjie (Becket) Qin >>>>>> >>>>>>> On 3/7/15, 10:24 PM, "Zakee" <kzak...@netzero.net >>>>>>> <mailto:kzak...@netzero.net>> wrote: >>>>>>> >>>>>>> I started with clean cluster and started to push data. It still does >>>>>>> the >>>>>>> rebalance at random durations even though the auto.leader.relabalance >>>>>>> is >>>>>>> set to false. >>>>>>> >>>>>>> Thanks >>>>>>> Zakee >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On Mar 6, 2015, at 3:51 PM, Jiangjie Qin <j...@linkedin.com.INVALID >>>>>>>> <mailto:j...@linkedin.com.INVALID>> >>>>>>>> wrote: >>>>>>>> >>>>>>>> Yes, the rebalance should not happen in that case. That is a little >>>>>>>> bit >>>>>>>> strange. Could you try to launch a clean Kafka cluster with >>>>>>>> auto.leader.election disabled and try push data? >>>>>>>> When leader migration occurs, NotLeaderForPartition exception is >>>>>>>> expected. >>>>>>>> >>>>>>>> Jiangjie (Becket) Qin >>>>>>>> >>>>>>>> >>>>>>>>> On 3/6/15, 3:14 PM, "Zakee" <kzak...@netzero.net >>>>>>>>> <mailto:kzak...@netzero.net>> wrote: >>>>>>>>> >>>>>>>>> Yes, Jiangjie, I do see lots of these errors "Starting preferred >>>>>>>>> replica >>>>>>>>> leader election for partitions” in logs. I also see lot of Produce >>>>>>>>> request failure warnings in with the NotLeader Exception. >>>>>>>>> >>>>>>>>> I tried switching off the auto.leader.relabalance to false. I am >>>>>>>>> still >>>>>>>>> noticing the rebalance happening. My understanding was the rebalance >>>>>>>>> will >>>>>>>>> not happen when this is set to false. >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> Zakee >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Feb 25, 2015, at 5:17 PM, Jiangjie Qin >>>>>>>>>> <j...@linkedin.com.INVALID <mailto:j...@linkedin.com.INVALID>> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> I don’t think num.replica.fetchers will help in this case. >>>>>>>>>> Increasing >>>>>>>>>> number of fetcher threads will only help in cases where you have a >>>>>>>>>> large >>>>>>>>>> amount of data coming into a broker and more replica fetcher >>>>>>>>>> threads >>>>>>>>>> will >>>>>>>>>> help keep up. We usually only use 1-2 for each broker. But in your >>>>>>>>>> case, >>>>>>>>>> it looks that leader migration cause issue. >>>>>>>>>> Do you see anything else in the log? Like preferred leader >>>>>>>>>> election? >>>>>>>>>> >>>>>>>>>> Jiangjie (Becket) Qin >>>>>>>>>> >>>>>>>>>> On 2/25/15, 5:02 PM, "Zakee" <kzak...@netzero.net >>>>>>>>>> <mailto:kzak...@netzero.net> >>>>>>>>>> <mailto:kzak...@netzero.net <mailto:kzak...@netzero.net>>> wrote: >>>>>>>>>> >>>>>>>>>>> Thanks, Jiangjie. >>>>>>>>>>> >>>>>>>>>>> Yes, I do see under partitions usually shooting every hour. >>>>>>>>>>> Anythings >>>>>>>>>>> that >>>>>>>>>>> I could try to reduce it? >>>>>>>>>>> >>>>>>>>>>> How does "num.replica.fetchers" affect the replica sync? Currently >>>>>>>>>>> have >>>>>>>>>>> configured 7 each of 5 brokers. >>>>>>>>>>> >>>>>>>>>>> -Zakee >>>>>>>>>>> >>>>>>>>>>> On Wed, Feb 25, 2015 at 4:17 PM, Jiangjie Qin >>>>>>>>>>> <j...@linkedin.com.invalid <mailto:j...@linkedin.com.invalid>> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> These messages are usually caused by leader migration. I think as >>>>>>>>>>>> long >>>>>>>>>>>> as >>>>>>>>>>>> you don¹t see this lasting for ever and got a bunch of under >>>>>>>>>>>> replicated >>>>>>>>>>>> partitions, it should be fine. >>>>>>>>>>>> >>>>>>>>>>>> Jiangjie (Becket) Qin >>>>>>>>>>>> >>>>>>>>>>>>> On 2/25/15, 4:07 PM, "Zakee" <kzak...@netzero.net >>>>>>>>>>>>> <mailto:kzak...@netzero.net>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Need to know if I should I be worried about this or ignore them. >>>>>>>>>>>>> >>>>>>>>>>>>> I see tons of these exceptions/warnings in the broker logs, not >>>>>>>>>>>>> sure >>>>>>>>>>>> what >>>>>>>>>>>>> causes them and what could be done to fix them. >>>>>>>>>>>>> >>>>>>>>>>>>> ERROR [ReplicaFetcherThread-3-5], Error for partition >>>>>>>>>>>>> [TestTopic] >>>>>>>>>>>>> to >>>>>>>>>>>>> broker >>>>>>>>>>>>> 5:class kafka.common.NotLeaderForPartitionException >>>>>>>>>>>>> (kafka.server.ReplicaFetcherThread) >>>>>>>>>>>>> [2015-02-25 11:01:41,785] ERROR [ReplicaFetcherThread-3-5], >>>>>>>>>>>>> Error >>>>>>>>>>>>> for >>>>>>>>>>>>> partition [TestTopic] to broker 5:class >>>>>>>>>>>>> kafka.common.NotLeaderForPartitionException >>>>>>>>>>>>> (kafka.server.ReplicaFetcherThread) >>>>>>>>>>>>> [2015-02-25 11:01:41,785] WARN [Replica Manager on Broker 2]: >>>>>>>>>>>>> Fetch >>>>>>>>>>>>> request >>>>>>>>>>>>> with correlation id 950084 from client ReplicaFetcherThread-1-2 >>>>>>>>>>>>> on >>>>>>>>>>>>> partition [TestTopic,2] failed due to Leader not local for >>>>>>>>>>>>> partition >>>>>>>>>>>>> [TestTopic,2] on broker 2 (kafka.server.ReplicaManager) >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Any ideas? >>>>>>>>>>>>> >>>>>>>>>>>>> -Zakee >>>>>>>>>>>>> ____________________________________________________________ >>>>>>>>>>>>> Next Apple Sensation >>>>>>>>>>>>> 1 little-known path to big profits >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061 >>>>>>>>>>>>> <http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061> >>>>>>>>>>>>> st0 >>>>>>>>>>>>> 3v >>>>>>>>>>>>> uc >>>>>>>>>>>> >>>>>>>>>>>> ____________________________________________________________ >>>>>>>>>>>> Extended Stay America >>>>>>>>>>>> Get Fantastic Amenities, low rates! Kitchen, Ample Workspace, >>>>>>>>>>>> Free >>>>>>>>>>>> WIFI >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4m >>>>>>>>>>>> <http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4m> >>>>>>>>>>>> p02 >>>>>>>>>>>> du >>>>>>>>>>>> c >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ____________________________________________________________ >>>>>>>>>> Extended Stay America >>>>>>>>>> Official Site. Free WIFI, Kitchens. Our best rates here, >>>>>>>>>> guaranteed. >>>>>>>>>> >>>>>>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13d >>>>>>>>>> <http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13d> >>>>>>>>>> uc >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> <http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13 >>>>>>>>>> duc >>>>>>>> >>>>>>>> >>>>>>>> ____________________________________________________________ >>>>>>>> The WORST exercise for aging >>>>>>>> Avoid this "healthy" exercise to look & feel 5-10 years >>>>>>>> YOUNGER >>>>>>>> >>>>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07d >>>>>>>> <http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07d> >>>>>>>> uc >>>>>> >>>>>> >>>>>> ____________________________________________________________ >>>>>> Seabourn Luxury Cruises >>>>>> Receive special offers from the World's Finest Small-Ship Cruise >>>>>> Line! >>>>>> >>>>>> http://thirdpartyoffers.netzero.net/TGL3255/54fbf3b0f058073b02901mp14duc >>>>>> <http://thirdpartyoffers.netzero.net/TGL3255/54fbf3b0f058073b02901mp14duc> >>> >>> >>> ____________________________________________________________ >>> Discover Seabourn >>> A journey as beautiful as the destination, request a brochure today! >>> http://thirdpartyoffers.netzero.net/TGL3255/54fdebfe6a2a36bfb0bb3mp10duc >>> <http://thirdpartyoffers.netzero.net/TGL3255/54fdebfe6a2a36bfb0bb3mp10duc> >> >> >> Thanks >> Zakee >> >> >> >> ____________________________________________________________ >> Want to place your ad here? >> Advertise on United Online >> http://thirdpartyoffers.netzero.net/TGL3255/54fdf80bc575a780b0397mp05duc >