Unfortunately all our tables and regions are active 24/7. Traffic does fall some at night but there is no real downtime.
It is not user facing load though so we could I guess turn off traffic for a while as data queues up in Kafka. But not too long as then we're playing catch up. ---- Saad On Friday, April 29, 2016, Frank Luo <j...@merkleinc.com> wrote: > Saad, > > Will all your tables/regions be used 24/7, or at any time, just a part of > regions used and others are running ideal? > > If latter, I developed a tool to launch major-compact in a "smart" way, > because I am facing a similar issue. > https://github.com/jinyeluo/smarthbasecompactor. > > It looks at every RegionServer, and find non-hot regions with most store > files and starts compacting. It just continue going until time is up. Just > to be clear, it doesn't perform MC itself, which is a scary thing to do, > but tell region servers to do MC. > > We have it running in our cluster for about 10 hours a day and it has > virtually no impact to applications and the cluster is doing far better > than when using default scheduled MC. > > > -----Original Message----- > From: Saad Mufti [mailto:saad.mu...@gmail.com <javascript:;>] > Sent: Friday, April 29, 2016 1:51 PM > To: user@hbase.apache.org <javascript:;> > Subject: Re: Major Compaction Strategy > > We have more issues now, after testing this in dev, in our production > cluster which has tons of data (60 regions servers and around 7000 > regions), we tried to do rolling compaction and most regions that were > around 6-7 GB n size were taking 4-5 minutes to finish. Based on this we > estimated it would take something like 20 days for a single run to finish, > which doesn't seem reasonable. > > So is it more reasonable to aim for doing major compaction across all > region servers at once but within a RS one region at a time? That would cut > it down to around 8 hours which is still very long. Or is it better to > compact all regions on one region server, then move to the next? > > The goal of all this is to maintain decent write performance while still > doing compaction. We don't have a good very low load period for our cluster > so trying to find a way to do this without cluster downtime. > > Thanks. > > ---- > Saad > > > On Wed, Apr 20, 2016 at 1:19 PM, Saad Mufti <saad.mu...@gmail.com > <javascript:;>> wrote: > > > Thanks for the pointer. Working like a charm. > > > > ---- > > Saad > > > > > > On Tue, Apr 19, 2016 at 4:01 PM, Ted Yu <yuzhih...@gmail.com > <javascript:;>> wrote: > > > >> Please use the following method of HBaseAdmin: > >> > >> public CompactionState getCompactionStateForRegion(final byte[] > >> regionName) > >> > >> Cheers > >> > >> On Tue, Apr 19, 2016 at 12:56 PM, Saad Mufti <saad.mu...@gmail.com > <javascript:;>> > >> wrote: > >> > >> > Hi, > >> > > >> > We have a large HBase 1.x cluster in AWS and have disabled > >> > automatic > >> major > >> > compaction as advised. We were running our own code for compaction > >> > daily around midnight which calls > >> > HBaseAdmin.majorCompactRegion(byte[] > >> > regionName) in a rolling fashion across all regions. > >> > > >> > But we missed the fact that this is an asynchronous operation, so > >> > in practice this causes major compaction to run across all regions, > >> > at > >> least > >> > those not already major compacted (for example because previous > >> > minor compactions got upgraded to major ones). > >> > > >> > We don't really have a suitable low load period, so what is a > >> > suitable > >> way > >> > to make major compaction run in a rolling fashion region by region? > >> > The > >> API > >> > above provides no return value for us to be able to wait for one > >> compaction > >> > to finish before moving to the next. > >> > > >> > Thanks. > >> > > >> > ---- > >> > Saad > >> > > >> > > > > > This email and any attachments transmitted with it are intended for use by > the intended recipient(s) only. If you have received this email in error, > please notify the sender immediately and then delete it. If you are not the > intended recipient, you must not keep, use, disclose, copy or distribute > this email without the author’s prior permission. We take precautions to > minimize the risk of transmitting software viruses, but we advise you to > perform your own virus checks on any attachment to this message. We cannot > accept liability for any loss or damage caused by software viruses. The > information contained in this communication may be confidential and may be > subject to the attorney-client privilege. >