Re: Does HBase bulkload support Incremental data?

Josh Elser Tue, 27 Mar 2018 07:42:49 -0700

Yes, you can bulk load into a table which already contains data.

The ideal case is that you generate HFiles which map exactly to thedistribution of Regions on your HBase cluster. However, given that weknow that Region boundaries can change, the bulk load client(LoadIncrementalHFiles) has the ability to handle HFiles which no longerfit into a single Region. This is done client-side and then theresulting files are automatically resubmitted.

Beware: this is a very expensive and slow process (e.g. consider howlong it would take to rewrite 100GB of data in a single process becauseyou did not use the correct Region split points when creating the data).Most bulk loading issues I encounter are related to incorrect splitpoints being used which causes the bulk load process to take hours todays to complete (instead of seconds to minutes).


On 3/27/18 9:15 AM, Jone Zhang wrote:

Does HBase bulkload support Incremental data?
How does it work if the incremental data key-range overlap with the data
already exists?

Thanks

Re: Does HBase bulkload support Incremental data?

Reply via email to