Bulk importing requires the table the data is being bulk imported into to already exists. This is because the mapreduce job needs to extract the region start/end keys in order to drive the reducers. This means that you need to create your table before hand, providing the appropriate pre-splitting and then run your bulk ingest and bulk load to get the data into the table. If you were to not pre-split your table then you would end up with one reducer in your bulk ingest job. This also means that your bulk ingest cluster will need to be able to communicate with your HBase instance.

-Austin

On 7/18/19 4:39 AM, Michael wrote:
Hi,

I looked at the possibility of bulk importing into hbase, but somehow I
don't get it. I am not able to perform a presplitting of the data, so
does bulk importing work without presplitting?
As I understand it, instead of putting the data, I create the hbase
region files, but all tutorials I read mentioned presplitting...

So, is presplitting essential for bulk importing?

It would be really helpful, if someone could point me to demo
implementation of a bulk import.

Thanks for helping
  Michael


Reply via email to