Austin is right. The pre-splitting is mainly used for generate&load HFiles,
say
when do bulkload, it will load each generated hfile to the corresponding
region
who include the rowkey interval of the hfile. If no pre-splitting, then all
HFiles
will be in one region, bulkload will be time-consuming and it's easy to be
hotspot
when query coming in.

About the demo, you can see here:
[1]. https://hbase.apache.org/book.html#arch.bulk.load
[2].
http://blog.cloudera.com/blog/2013/09/how-to-use-hbase-bulk-loading-and-why/

Thanks.

On Thu, Jul 18, 2019 at 9:21 PM Austin Heyne <[email protected]> wrote:

> Bulk importing requires the table the data is being bulk imported into
> to already exists. This is because the mapreduce job needs to extract
> the region start/end keys in order to drive the reducers. This means
> that you need to create your table before hand, providing the
> appropriate pre-splitting and then run your bulk ingest and bulk load to
> get the data into the table. If you were to not pre-split your table
> then you would end up with one reducer in your bulk ingest job. This
> also means that your bulk ingest cluster will need to be able to
> communicate with your HBase instance.
>
> -Austin
>
> On 7/18/19 4:39 AM, Michael wrote:
> > Hi,
> >
> > I looked at the possibility of bulk importing into hbase, but somehow I
> > don't get it. I am not able to perform a presplitting of the data, so
> > does bulk importing work without presplitting?
> > As I understand it, instead of putting the data, I create the hbase
> > region files, but all tutorials I read mentioned presplitting...
> >
> > So, is presplitting essential for bulk importing?
> >
> > It would be really helpful, if someone could point me to demo
> > implementation of a bulk import.
> >
> > Thanks for helping
> >   Michael
> >
> >
>

Reply via email to