Yes, you can bulk load into a table which already contains data.

The ideal case is that you generate HFiles which map exactly to the distribution of Regions on your HBase cluster. However, given that we know that Region boundaries can change, the bulk load client (LoadIncrementalHFiles) has the ability to handle HFiles which no longer fit into a single Region. This is done client-side and then the resulting files are automatically resubmitted.

Beware: this is a very expensive and slow process (e.g. consider how long it would take to rewrite 100GB of data in a single process because you did not use the correct Region split points when creating the data). Most bulk loading issues I encounter are related to incorrect split points being used which causes the bulk load process to take hours to days to complete (instead of seconds to minutes).

On 3/27/18 9:15 AM, Jone Zhang wrote:
Does HBase bulkload support Incremental data?
How does it work if the incremental data key-range overlap with the data
already exists?

Thanks

Reply via email to