Hi,

What's the best way to automate major compactions without enabling it
during off peak period?

What I was testing is simple script which runs on every node in cluster,
checks if there is major compaction already running on that node, if not
picks one region for compaction and run compaction on that one region.

It's running for some time and it helped us get our data to much better
shape, but now I'm not quite sure how to choose anymore which region to
compact. So far I was reading for that node rs-status#regionStoreStats and
first choosing the one with biggest amount of storefiles, and then those
with biggest storefile sizes.

Is there maybe something more intelligent I could/should do?

Thanks a lot!

Reply via email to