subject:"Automating major compactions"

Re: Automating major compactions

2015-07-09 Thread Dejan Menges

Thanks a lot to everyone - very nice point about looking for oldest file and taking locality into consideration. Going to implement it now :) On Wed, Jul 8, 2015 at 10:57 PM Bryan Beaudreault bbeaudrea...@hubspot.com wrote: Our automation uses a combination of the following to determine what to

Automating major compactions

2015-07-08 Thread Dejan Menges

Hi, What's the best way to automate major compactions without enabling it during off peak period? What I was testing is simple script which runs on every node in cluster, checks if there is major compaction already running on that node, if not picks one region for compaction and run compaction

Re: Automating major compactions

2015-07-08 Thread Behdad Forghani

To start major compaction for tablename from cli, you need to run: echo major_compact tablename | hbase shell I do this after bulk loading to the table. FYI, to avoid surprises, I also turn off load balancer and rebalance regions manually. The cli command to turn off balancer is: echo

Re: Automating major compactions

2015-07-08 Thread Dejan Menges

Hi Behdad, Thanks a lot, but this part I do already. My question was more what to use to most intelligently (what exposed or not exposed metrics) figure out where major compaction is needed the most. Currently, choosing the region which has biggest number of store files + the biggest amount of

Re: Automating major compactions

2015-07-08 Thread Dejan Menges

Hi Mikhail, Actually, reason is quite stupid on my side - to avoid compacting one region over and over again while others are waiting in line (reading HTML and sorting only on number of store files gets you at some point having bunch of regions having exactly the same number of store files).

Re: Automating major compactions

2015-07-08 Thread Mikhail Antonov

I totally understand the reasoning behind compacting regions with biggest number of store files, but didn't follow why it's best to compact regions which have biggest store files, maybe I'm missing something? I'd maybe compact regions which have the smallest avg storefile size? You may also want

Re: Automating major compactions

2015-07-08 Thread Vladimir Rodionov

You can find this info yourself, Dejan 1. Locate table dir on HDFS 2. List all regions (directories) 3. Iterate files in each directory and find the oldest one (creation time) 4. The region with the oldest file is your candidate for major compaction /HBASE_ROOT/data/namespace/table/region (If my

Re: Automating major compactions

2015-07-08 Thread Bryan Beaudreault

Our automation uses a combination of the following to determine what to compact: - Which regions have bad locality (% of blocks are local vs remote, using HDFS getBlockLocations APIs) - Which regions have the most number of HFiles (most files per region/cf directory) - Which regions have gone the

Re: Automating major compactions

2015-07-08 Thread Behdad Forghani

Hi, For my project, HBase would come to a halt after about 8 hours. I managed to reduce the load time to 10 minutes. What gave me the best result was: splitting regions to best fit my data, compacting them manually when there was a change to the tables and using snappy for compression. I have

Re: Automating major compactions

2015-07-08 Thread Jean-Marc Spaggiari

Just missing the ColumnFamiliy at the end of the path. Your memory is pretty good. JM 2015-07-08 16:39 GMT-04:00 Vladimir Rodionov vladrodio...@gmail.com: You can find this info yourself, Dejan 1. Locate table dir on HDFS 2. List all regions (directories) 3. Iterate files in each directory

Re: Automating major compactions

Automating major compactions

Re: Automating major compactions

Re: Automating major compactions

Re: Automating major compactions

Re: Automating major compactions

Re: Automating major compactions

Re: Automating major compactions

Re: Automating major compactions

Re: Automating major compactions

10 matches

Site Navigation

Mail list logo

Footer information