I wrote a small program to do MC in a "smart" way here: 
https://github.com/jinyeluo/smarthbasecompactor/

Instead of blindly running MC on a table level, the program find a non-hot 
regions that has most store-files on a per region-server base, and run MC on 
them. Once done, it finds the next candidates... It just keeps on going until 
time is up.

I am sure it has a lot area for improvement if something wants to go crazy on. 
But the code has been running for about half a year and it seems working well.

-----Original Message-----
From: Vladimir Rodionov [mailto:vladrodio...@gmail.com]
Sent: Monday, April 04, 2016 12:15 PM
To: user@hbase.apache.org
Cc: Sumit Nigam <sumit_o...@yahoo.com>
Subject: Re: Major compaction

>> Why I am trying to understand this is because Hbase also sets it to
>> 24
hour default (for time based compaction) and I am looking to lower it to say >> 
20 mins to reduce stress by spreading the load.

The more frequently you run major compaction the more IO (disk/network) you 
consume.

Usually, in production environment, periodic major compactions are disabled and 
run manually  to avoid major compaction storms.

To control major compaction completely you will also need to disable promotion 
minor compaction to major ones. You can do this, by setting maximum compaction 
size for minor compaction:
*hbase.hstore.compaction.max.size*

-Vlad


On Mon, Apr 4, 2016 at 8:55 AM, Esteban Gutierrez <este...@cloudera.com>
wrote:

> Hello Sumit,
>
> Ideally you shouldn't be triggering major compactions that frequently
> since minor compactions should be taking care of reducing the number
> of store files. The caveat of doing it more frequently is the
> additional disk/network I/O.
>
> Can you please elaborate more on "reduce stress by spreading the
> load." Is there anything else you are seeing in your cluster that is
> suggesting to you to lower the period for major compactions?
>
> esteban.
>
> --
> Cloudera, Inc.
>
>
> On Mon, Apr 4, 2016 at 8:35 AM, Sumit Nigam
> <sumit_o...@yahoo.com.invalid>
> wrote:
>
> > Hi,
> > Are there major overheads to running major compaction frequently? As
> > much as I know, it produces one Hfile for a region and processes
> > delete
> markers
> > and version related drops. So, if this process has happened once
> > say. a
> few
> > mins back then another major compaction should ideally not cause
> > much
> harm.
> > Why I am trying to understand this is because Hbase also sets it to
> > 24 hour default (for time based compaction) and I am looking to
> > lower it to say 20 mins to reduce stress by spreading the load.
> > Or am I completely off-track?
> > Thanks,Sumit
>
Merkle was named a leader in Customer Insights Services Providers by Forrester 
Research
<http://www.merkleinc.com/who-we-are-customer-relationship-marketing-agency/awards-recognition/merkle-named-leader-forrester?utm_source=emailfooter&utm_medium=email&utm_campaign=2016MonthlyEmployeeFooter>

Forrester Research report names 500friends, a Merkle Company, a leader in 
customer Loyalty Solutions for Midsize 
Organizations<http://www.merkleinc.com/who-we-are-customer-relationship-marketing-agency/awards-recognition/500friends-merkle-company-named?utm_source=emailfooter&utm_medium=email&utm_campaign=2016MonthlyEmployeeFooter>
This email and any attachments transmitted with it are intended for use by the 
intended recipient(s) only. If you have received this email in error, please 
notify the sender immediately and then delete it. If you are not the intended 
recipient, you must not keep, use, disclose, copy or distribute this email 
without the author’s prior permission. We take precautions to minimize the risk 
of transmitting software viruses, but we advise you to perform your own virus 
checks on any attachment to this message. We cannot accept liability for any 
loss or damage caused by software viruses. The information contained in this 
communication may be confidential and may be subject to the attorney-client 
privilege.

Reply via email to