Can you provide more logs (complete) on Broker 3 till time : *[2015-03-14 07:46:52,517*] WARN [ReplicaFetcherThread-2-4], Replica 3 for partition [Topic22kv,5] reset its fetch offset from 1400864851 to current leader 4's start offset 1400864851 (kafka.server.ReplicaFetcherThread)
I would like to see logs from time much before it sent the fetch request to Broker 4 to the time above. I want to check if in any case Broker 3 was a leader before broker 4 took over. Additional logs will help. Thanks, Mayuresh On Sat, Mar 14, 2015 at 8:35 PM, Zakee <kzak...@netzero.net> wrote: > log.cleanup.policy is delete not compact. > log.cleaner.enable=true > log.cleaner.threads=5 > log.cleanup.policy=delete > log.flush.scheduler.interval.ms=3000 > log.retention.minutes=1440 > log.segment.bytes=1073741824 (1gb) > > Messages are keyed but not compressed, producer async and uses kafka > default partitioner. > String message = msg.getString(); > String uniqKey = ""+rnd.nextInt();// random key > String partKey = getPartitionKey();// partition key > KeyedMessage<String, String> data = new KeyedMessage<String, > String>(this.topicName, uniqKey, partKey, message); > producer.send(data); > > Thanks > Zakee > > > > > On Mar 14, 2015, at 4:23 PM, gharatmayures...@gmail.com wrote: > > > > Is your topic log compacted? Also if it is are the messages keyed? Or > are the messages compressed? > > > > Thanks, > > > > Mayuresh > > > > Sent from my iPhone > > > >> On Mar 14, 2015, at 2:02 PM, Zakee <kzak...@netzero.net <mailto: > kzak...@netzero.net>> wrote: > >> > >> Thanks, Jiangjie for helping resolve the kafka controller migration > driven partition leader rebalance issue. The logs are much cleaner now. > >> > >> There are a few incidences of Out of range offset even though there is > no consumers running, only producers and replica fetchers. I was trying to > relate to a cause, looks like compaction (log segment deletion) causing > this. Not sure whether this is expected behavior. > >> > >> Broker-4: > >> [2015-03-14 07:46:52,338] ERROR [Replica Manager on Broker 4]: Error > when processing fetch request for partition [Topic22kv,5] offset 1754769769 > from follower with correlation id 1645671. Possible cause: Request for > offset 1754769769 but we only have log segments in the range 1400864851 to > 1754769732. (kafka.server.ReplicaManager) > >> > >> Broker-3: > >> [2015-03-14 07:46:52,356] INFO The cleaning for partition [Topic22kv,5] > is aborted and paused (kafka.log.LogCleaner) > >> [2015-03-14 07:46:52,408] INFO Scheduling log segment 1400864851 for > log Topic22kv-5 for deletion. (kafka.log.Log) > >> … > >> [2015-03-14 07:46:52,421] INFO Compaction for partition [Topic22kv,5] > is resumed (kafka.log.LogCleaner) > >> [2015-03-14 07:46:52,517] ERROR [ReplicaFetcherThread-2-4], Current > offset 1754769769 for partition [Topic22kv,5] out of range; reset offset to > 1400864851 (kafka.server.ReplicaFetcherThread) > >> [2015-03-14 07:46:52,517] WARN [ReplicaFetcherThread-2-4], Replica 3 > for partition [Topic22kv,5] reset its fetch offset from 1400864851 to > current leader 4's start offset 1400864851 > (kafka.server.ReplicaFetcherThread) > >> > >> ____________________________________________________________ > >> Old School Yearbook Pics > >> View Class Yearbooks Online Free. Search by School & Year. Look Now! > >> > http://thirdpartyoffers.netzero.net/TGL3231/5504a2032e49422021991st02vuc < > http://thirdpartyoffers.netzero.net/TGL3231/5504a2032e49422021991st02vuc> > >> <topic22kv_746a_314_logs.txt> > >> > >> > >> Thanks > >> Zakee > >> > >>> On Mar 9, 2015, at 12:18 PM, Zakee <kzak...@netzero.net> wrote: > >>> > >>> No broker restarts. > >>> > >>> Created a kafka issue: > https://issues.apache.org/jira/browse/KAFKA-2011 < > https://issues.apache.org/jira/browse/KAFKA-2011> > >>> > >>>>> Logs for rebalance: > >>>>> [2015-03-07 16:52:48,969] INFO [Controller 2]: Resuming preferred > replica election for partitions: (kafka.controller.KafkaController) > >>>>> [2015-03-07 16:52:48,969] INFO [Controller 2]: Partitions that > completed preferred replica election: (kafka.controller.KafkaController) > >>>>> … > >>>>> [2015-03-07 12:07:06,783] INFO [Controller 4]: Resuming preferred > replica election for partitions: (kafka.controller.KafkaController) > >>>>> ... > >>>>> [2015-03-07 09:10:41,850] INFO [Controller 3]: Resuming preferred > replica election for partitions: (kafka.controller.KafkaController) > >>>>> ... > >>>>> [2015-03-07 08:26:56,396] INFO [Controller 1]: Starting preferred > replica leader election for partitions (kafka.controller.KafkaController) > >>>>> ... > >>>>> [2015-03-06 16:52:59,506] INFO [Controller 2]: Partitions undergoing > preferred replica election: (kafka.controller.KafkaController) > >>>>> > >>>>> Also, I still see lots of below errors (~69k) going on in the logs > since the restart. Is there any other reason than rebalance for these > errors? > >>>>> > >>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error > for partition [Topic-11,7] to broker 5:class > kafka.common.NotLeaderForPartitionException > (kafka.server.ReplicaFetcherThread) > >>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error > for partition [Topic-2,25] to broker 5:class > kafka.common.NotLeaderForPartitionException > (kafka.server.ReplicaFetcherThread) > >>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error > for partition [Topic-2,21] to broker 5:class > kafka.common.NotLeaderForPartitionException > (kafka.server.ReplicaFetcherThread) > >>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error > for partition [Topic-22,9] to broker 5:class > kafka.common.NotLeaderForPartitionException > (kafka.server.ReplicaFetcherThread) > >>> > >>> > >>>> Could you paste the related logs in controller.log? > >>> What specifically should I search for in the logs? > >>> > >>> Thanks, > >>> Zakee > >>> > >>> > >>> > >>>> On Mar 9, 2015, at 11:35 AM, Jiangjie Qin <j...@linkedin.com.INVALID > <mailto:j...@linkedin.com.INVALID>> wrote: > >>>> > >>>> Is there anything wrong with brokers around that time? E.g. Broker > restart? > >>>> The log you pasted are actually from replica fetchers. Could you > paste the > >>>> related logs in controller.log? > >>>> > >>>> Thanks. > >>>> > >>>> Jiangjie (Becket) Qin > >>>> > >>>>> On 3/9/15, 10:32 AM, "Zakee" <kzak...@netzero.net <mailto: > kzak...@netzero.net>> wrote: > >>>>> > >>>>> Correction: Actually the rebalance happened quite until 24 hours > after > >>>>> the start, and thats where below errors were found. Ideally rebalance > >>>>> should not have happened at all. > >>>>> > >>>>> > >>>>> Thanks > >>>>> Zakee > >>>>> > >>>>> > >>>>> > >>>>>>> On Mar 9, 2015, at 10:28 AM, Zakee <kzak...@netzero.net <mailto: > kzak...@netzero.net>> wrote: > >>>>>>> > >>>>>>> Hmm, that sounds like a bug. Can you paste the log of leader > rebalance > >>>>>>> here? > >>>>>> Thanks for you suggestions. > >>>>>> It looks like the rebalance actually happened only once soon after I > >>>>>> started with clean cluster and data was pushed, it didn’t happen > again > >>>>>> so far, and I see the partitions leader counts on brokers did not > change > >>>>>> since then. One of the brokers was constantly showing 0 for > partition > >>>>>> leader count. Is that normal? > >>>>>> > >>>>>> Also, I still see lots of below errors (~69k) going on in the logs > >>>>>> since the restart. Is there any other reason than rebalance for > these > >>>>>> errors? > >>>>>> > >>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error > for > >>>>>> partition [Topic-11,7] to broker 5:class > >>>>>> kafka.common.NotLeaderForPartitionException > >>>>>> (kafka.server.ReplicaFetcherThread) > >>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error > for > >>>>>> partition [Topic-2,25] to broker 5:class > >>>>>> kafka.common.NotLeaderForPartitionException > >>>>>> (kafka.server.ReplicaFetcherThread) > >>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-2-5], Error > for > >>>>>> partition [Topic-2,21] to broker 5:class > >>>>>> kafka.common.NotLeaderForPartitionException > >>>>>> (kafka.server.ReplicaFetcherThread) > >>>>>> [2015-03-07 14:23:28,963] ERROR [ReplicaFetcherThread-1-5], Error > for > >>>>>> partition [Topic-22,9] to broker 5:class > >>>>>> kafka.common.NotLeaderForPartitionException > >>>>>> (kafka.server.ReplicaFetcherThread) > >>>>>> > >>>>>>> Some other things to check are: > >>>>>>> 1. The actual property name is auto.leader.rebalance.enable, not > >>>>>>> auto.leader.rebalance. You’ve probably known this, just to double > >>>>>>> confirm. > >>>>>> Yes > >>>>>> > >>>>>>> 2. In zookeeper path, can you verify > /admin/preferred_replica_election > >>>>>>> does not exist? > >>>>>> ls /admin > >>>>>> [delete_topics] > >>>>>> ls /admin/preferred_replica_election > >>>>>> Node does not exist: /admin/preferred_replica_election > >>>>>> > >>>>>> > >>>>>> Thanks > >>>>>> Zakee > >>>>>> > >>>>>> > >>>>>> > >>>>>>> On Mar 7, 2015, at 10:49 PM, Jiangjie Qin > <j...@linkedin.com.INVALID <mailto:j...@linkedin.com.INVALID>> > >>>>>>> wrote: > >>>>>>> > >>>>>>> Hmm, that sounds like a bug. Can you paste the log of leader > rebalance > >>>>>>> here? > >>>>>>> Some other things to check are: > >>>>>>> 1. The actual property name is auto.leader.rebalance.enable, not > >>>>>>> auto.leader.rebalance. You’ve probably known this, just to double > >>>>>>> confirm. > >>>>>>> 2. In zookeeper path, can you verify > /admin/preferred_replica_election > >>>>>>> does not exist? > >>>>>>> > >>>>>>> Jiangjie (Becket) Qin > >>>>>>> > >>>>>>>> On 3/7/15, 10:24 PM, "Zakee" <kzak...@netzero.net <mailto: > kzak...@netzero.net>> wrote: > >>>>>>>> > >>>>>>>> I started with clean cluster and started to push data. It still > does > >>>>>>>> the > >>>>>>>> rebalance at random durations even though the > auto.leader.relabalance > >>>>>>>> is > >>>>>>>> set to false. > >>>>>>>> > >>>>>>>> Thanks > >>>>>>>> Zakee > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> On Mar 6, 2015, at 3:51 PM, Jiangjie Qin > <j...@linkedin.com.INVALID <mailto:j...@linkedin.com.INVALID>> > >>>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>> Yes, the rebalance should not happen in that case. That is a > little > >>>>>>>>> bit > >>>>>>>>> strange. Could you try to launch a clean Kafka cluster with > >>>>>>>>> auto.leader.election disabled and try push data? > >>>>>>>>> When leader migration occurs, NotLeaderForPartition exception is > >>>>>>>>> expected. > >>>>>>>>> > >>>>>>>>> Jiangjie (Becket) Qin > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> On 3/6/15, 3:14 PM, "Zakee" <kzak...@netzero.net <mailto: > kzak...@netzero.net>> wrote: > >>>>>>>>>> > >>>>>>>>>> Yes, Jiangjie, I do see lots of these errors "Starting preferred > >>>>>>>>>> replica > >>>>>>>>>> leader election for partitions” in logs. I also see lot of > Produce > >>>>>>>>>> request failure warnings in with the NotLeader Exception. > >>>>>>>>>> > >>>>>>>>>> I tried switching off the auto.leader.relabalance to false. I am > >>>>>>>>>> still > >>>>>>>>>> noticing the rebalance happening. My understanding was the > rebalance > >>>>>>>>>> will > >>>>>>>>>> not happen when this is set to false. > >>>>>>>>>> > >>>>>>>>>> Thanks > >>>>>>>>>> Zakee > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> On Feb 25, 2015, at 5:17 PM, Jiangjie Qin > >>>>>>>>>>> <j...@linkedin.com.INVALID <mailto:j...@linkedin.com.INVALID>> > >>>>>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>> I don’t think num.replica.fetchers will help in this case. > >>>>>>>>>>> Increasing > >>>>>>>>>>> number of fetcher threads will only help in cases where you > have a > >>>>>>>>>>> large > >>>>>>>>>>> amount of data coming into a broker and more replica fetcher > >>>>>>>>>>> threads > >>>>>>>>>>> will > >>>>>>>>>>> help keep up. We usually only use 1-2 for each broker. But in > your > >>>>>>>>>>> case, > >>>>>>>>>>> it looks that leader migration cause issue. > >>>>>>>>>>> Do you see anything else in the log? Like preferred leader > >>>>>>>>>>> election? > >>>>>>>>>>> > >>>>>>>>>>> Jiangjie (Becket) Qin > >>>>>>>>>>> > >>>>>>>>>>> On 2/25/15, 5:02 PM, "Zakee" <kzak...@netzero.net <mailto: > kzak...@netzero.net> > >>>>>>>>>>> <mailto:kzak...@netzero.net <mailto:kzak...@netzero.net>>> > wrote: > >>>>>>>>>>> > >>>>>>>>>>>> Thanks, Jiangjie. > >>>>>>>>>>>> > >>>>>>>>>>>> Yes, I do see under partitions usually shooting every hour. > >>>>>>>>>>>> Anythings > >>>>>>>>>>>> that > >>>>>>>>>>>> I could try to reduce it? > >>>>>>>>>>>> > >>>>>>>>>>>> How does "num.replica.fetchers" affect the replica sync? > Currently > >>>>>>>>>>>> have > >>>>>>>>>>>> configured 7 each of 5 brokers. > >>>>>>>>>>>> > >>>>>>>>>>>> -Zakee > >>>>>>>>>>>> > >>>>>>>>>>>> On Wed, Feb 25, 2015 at 4:17 PM, Jiangjie Qin > >>>>>>>>>>>> <j...@linkedin.com.invalid <mailto:j...@linkedin.com.invalid > >> > >>>>>>>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>>> These messages are usually caused by leader migration. I > think as > >>>>>>>>>>>>> long > >>>>>>>>>>>>> as > >>>>>>>>>>>>> you don¹t see this lasting for ever and got a bunch of under > >>>>>>>>>>>>> replicated > >>>>>>>>>>>>> partitions, it should be fine. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Jiangjie (Becket) Qin > >>>>>>>>>>>>> > >>>>>>>>>>>>>> On 2/25/15, 4:07 PM, "Zakee" <kzak...@netzero.net <mailto: > kzak...@netzero.net>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Need to know if I should I be worried about this or ignore > them. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I see tons of these exceptions/warnings in the broker logs, > not > >>>>>>>>>>>>>> sure > >>>>>>>>>>>>> what > >>>>>>>>>>>>>> causes them and what could be done to fix them. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> ERROR [ReplicaFetcherThread-3-5], Error for partition > >>>>>>>>>>>>>> [TestTopic] > >>>>>>>>>>>>>> to > >>>>>>>>>>>>>> broker > >>>>>>>>>>>>>> 5:class kafka.common.NotLeaderForPartitionException > >>>>>>>>>>>>>> (kafka.server.ReplicaFetcherThread) > >>>>>>>>>>>>>> [2015-02-25 11:01:41,785] ERROR [ReplicaFetcherThread-3-5], > >>>>>>>>>>>>>> Error > >>>>>>>>>>>>>> for > >>>>>>>>>>>>>> partition [TestTopic] to broker 5:class > >>>>>>>>>>>>>> kafka.common.NotLeaderForPartitionException > >>>>>>>>>>>>>> (kafka.server.ReplicaFetcherThread) > >>>>>>>>>>>>>> [2015-02-25 11:01:41,785] WARN [Replica Manager on Broker > 2]: > >>>>>>>>>>>>>> Fetch > >>>>>>>>>>>>>> request > >>>>>>>>>>>>>> with correlation id 950084 from client > ReplicaFetcherThread-1-2 > >>>>>>>>>>>>>> on > >>>>>>>>>>>>>> partition [TestTopic,2] failed due to Leader not local for > >>>>>>>>>>>>>> partition > >>>>>>>>>>>>>> [TestTopic,2] on broker 2 (kafka.server.ReplicaManager) > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Any ideas? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> -Zakee > >>>>>>>>>>>>>> ____________________________________________________________ > >>>>>>>>>>>>>> Next Apple Sensation > >>>>>>>>>>>>>> 1 little-known path to big profits > >>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061 < > http://thirdpartyoffers.netzero.net/TGL3231/54ee63b9e704b63b94061> > >>>>>>>>>>>>>> st0 > >>>>>>>>>>>>>> 3v > >>>>>>>>>>>>>> uc > >>>>>>>>>>>>> > >>>>>>>>>>>>> ____________________________________________________________ > >>>>>>>>>>>>> Extended Stay America > >>>>>>>>>>>>> Get Fantastic Amenities, low rates! Kitchen, Ample Workspace, > >>>>>>>>>>>>> Free > >>>>>>>>>>>>> WIFI > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4m < > http://thirdpartyoffers.netzero.net/TGL3255/54ee66f26da6f66f10ad4m> > >>>>>>>>>>>>> p02 > >>>>>>>>>>>>> du > >>>>>>>>>>>>> c > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> ____________________________________________________________ > >>>>>>>>>>> Extended Stay America > >>>>>>>>>>> Official Site. Free WIFI, Kitchens. Our best rates here, > >>>>>>>>>>> guaranteed. > >>>>>>>>>>> > >>>>>>>>>>> > http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13d < > http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13d> > >>>>>>>>>>> uc > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> < > http://thirdpartyoffers.netzero.net/TGL3255/54ee80744cfa7747461mp13 > >>>>>>>>>>> duc > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> ____________________________________________________________ > >>>>>>>>> The WORST exercise for aging > >>>>>>>>> Avoid this "healthy" exercise to look & feel 5-10 years > >>>>>>>>> YOUNGER > >>>>>>>>> > >>>>>>>>> > http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07d < > http://thirdpartyoffers.netzero.net/TGL3255/54fa40e98a0e640e81196mp07d> > >>>>>>>>> uc > >>>>>>> > >>>>>>> > >>>>>>> ____________________________________________________________ > >>>>>>> Seabourn Luxury Cruises > >>>>>>> Receive special offers from the World's Finest Small-Ship > Cruise > >>>>>>> Line! > >>>>>>> > >>>>>>> > http://thirdpartyoffers.netzero.net/TGL3255/54fbf3b0f058073b02901mp14duc < > http://thirdpartyoffers.netzero.net/TGL3255/54fbf3b0f058073b02901mp14duc> > >>>> > >>>> > >>>> ____________________________________________________________ > >>>> Discover Seabourn > >>>> A journey as beautiful as the destination, request a brochure today! > >>>> > http://thirdpartyoffers.netzero.net/TGL3255/54fdebfe6a2a36bfb0bb3mp10duc < > http://thirdpartyoffers.netzero.net/TGL3255/54fdebfe6a2a36bfb0bb3mp10duc> > >>> > >>> > >>> Thanks > >>> Zakee > >>> > >>> > >>> > >>> ____________________________________________________________ > >>> Want to place your ad here? > >>> Advertise on United Online > >>> > http://thirdpartyoffers.netzero.net/TGL3255/54fdf80bc575a780b0397mp05duc > >> > > ____________________________________________________________ > > What's your flood risk? > > Find flood maps, interactive tools, FAQs, and agents in your area. > > http://thirdpartyoffers.netzero.net/TGL3255/5504cccfca43a4ccf0a56mp08duc > <http://thirdpartyoffers.netzero.net/TGL3255/5504cccfca43a4ccf0a56mp08duc> > -- -Regards, Mayuresh R. Gharat (862) 250-7125