We have more issues now, after testing this in dev, in our production cluster which has tons of data (60 regions servers and around 7000 regions), we tried to do rolling compaction and most regions that were around 6-7 GB n size were taking 4-5 minutes to finish. Based on this we estimated it would take something like 20 days for a single run to finish, which doesn't seem reasonable.
So is it more reasonable to aim for doing major compaction across all region servers at once but within a RS one region at a time? That would cut it down to around 8 hours which is still very long. Or is it better to compact all regions on one region server, then move to the next? The goal of all this is to maintain decent write performance while still doing compaction. We don't have a good very low load period for our cluster so trying to find a way to do this without cluster downtime. Thanks. ---- Saad On Wed, Apr 20, 2016 at 1:19 PM, Saad Mufti <saad.mu...@gmail.com> wrote: > Thanks for the pointer. Working like a charm. > > ---- > Saad > > > On Tue, Apr 19, 2016 at 4:01 PM, Ted Yu <yuzhih...@gmail.com> wrote: > >> Please use the following method of HBaseAdmin: >> >> public CompactionState getCompactionStateForRegion(final byte[] >> regionName) >> >> Cheers >> >> On Tue, Apr 19, 2016 at 12:56 PM, Saad Mufti <saad.mu...@gmail.com> >> wrote: >> >> > Hi, >> > >> > We have a large HBase 1.x cluster in AWS and have disabled automatic >> major >> > compaction as advised. We were running our own code for compaction daily >> > around midnight which calls HBaseAdmin.majorCompactRegion(byte[] >> > regionName) in a rolling fashion across all regions. >> > >> > But we missed the fact that this is an asynchronous operation, so in >> > practice this causes major compaction to run across all regions, at >> least >> > those not already major compacted (for example because previous minor >> > compactions got upgraded to major ones). >> > >> > We don't really have a suitable low load period, so what is a suitable >> way >> > to make major compaction run in a rolling fashion region by region? The >> API >> > above provides no return value for us to be able to wait for one >> compaction >> > to finish before moving to the next. >> > >> > Thanks. >> > >> > ---- >> > Saad >> > >> > >