1. Yes
2. HDFS NN pressure, read slow down, general poor performance
3. Default configuration is weekly, if you don't explicitly know some reasons why weekly doesn't work, this is what you should follow ;)
4. No

I would be surprised if you need to do anything special with S3, but I don't know for sure.

On 9/10/18 2:19 PM, Antonio Si wrote:
Hello,

As I understand, the deleted records in hbase files do not get removed
until a major compaction is performed.

I have a few questions regarding major compaction:

1.   If I set a TTL and/or a max number of versions, the records are older
than the TTL or the
       expired versions will still be in the hbase files until the major
compaction is performed?
       Is my understanding correct?

2.   If a major compaction is never performed on a table, besides the size
of the table keep
       increasing, eventually, we will have too many hbase files and the
cluster will slow down.
       Is there any other implications?

3.   Is there any guidelines about how often should we run major compaction?

4.   During major compaction, do we need to pause all read/write operations
until major
       compaction is finished?

       I realize that if using S3 as the storage, after I run major
compaction, there is inconsistencies
       between s3 metadata and s3 file system and I need to run a "emrfs
sync" to synchronize them
       after major compaction is completed. Does it mean I need to pause all
read/write operations
       during this period?

Thanks.

Antonio.

Reply via email to