Re: How to avoid major compaction during restart?

2018-06-28 Thread Mingliang LIU
Marcell, In Hadoop side, the NameNode (NN) will not schedule block re-replication unless the DataNode (DN) has been claimed "dead". By default the interval is >10mins. Usually your DN should have restarted before being "dead" in NN. If that still is a concern, you can make that interval longer ind

Bulk Load running files multiple times

2018-06-28 Thread Austin Heyne
Hey again, I'm running a bulk load on s3 and I'm seeing region servers being instructed to load an hfile multiple times. I've seen this behavior two different ways. The first time I saw this was after deciding to brute force my way around the problem in HBASE-20774 [1] by just letting the clu

Re: How to avoid major compaction during restart?

2018-06-28 Thread Marcell Ortutay
Er, to I made a mistake in the above question ; the issue is not so much the major compaction but rather that during restart (as nodes go up / down), Hadoop and HBase attempt to rebalance blocks and regions, causing unnecessary movement. So what I'm actually looking for is a way to avoid the balanc

How to avoid major compaction during restart?

2018-06-28 Thread Marcell Ortutay
Hi all, I'm interested in ways to avoid a major compaction when restarting all the HBase region servers in a cluster (for example, for a version upgrade). Are there any recommended techniques for achieving this? Thanks, Marcell