To add to that, the split will be done on the master, so if you
anticipate a lot of splits it can be an issue.
-Austin
On 7/18/19 12:32 PM, Jean-Marc Spaggiari wrote:
One think to add, when you will bulkload your files, if needed, they will
be split according to the regions boundaries.
Because between when you start your job and when you push your files, there
might have been some "natural" splits on the table side, the bulkloader has
to be able to re-split your generated data.
JMS
Le jeu. 18 juil. 2019 à 09:55, OpenInx <[email protected]> a écrit :
Austin is right. The pre-splitting is mainly used for generate&load HFiles,
say
when do bulkload, it will load each generated hfile to the corresponding
region
who include the rowkey interval of the hfile. If no pre-splitting, then all
HFiles
will be in one region, bulkload will be time-consuming and it's easy to be
hotspot
when query coming in.
About the demo, you can see here:
[1]. https://hbase.apache.org/book.html#arch.bulk.load
[2].
http://blog.cloudera.com/blog/2013/09/how-to-use-hbase-bulk-loading-and-why/
Thanks.
On Thu, Jul 18, 2019 at 9:21 PM Austin Heyne <[email protected]> wrote:
Bulk importing requires the table the data is being bulk imported into
to already exists. This is because the mapreduce job needs to extract
the region start/end keys in order to drive the reducers. This means
that you need to create your table before hand, providing the
appropriate pre-splitting and then run your bulk ingest and bulk load to
get the data into the table. If you were to not pre-split your table
then you would end up with one reducer in your bulk ingest job. This
also means that your bulk ingest cluster will need to be able to
communicate with your HBase instance.
-Austin
On 7/18/19 4:39 AM, Michael wrote:
Hi,
I looked at the possibility of bulk importing into hbase, but somehow I
don't get it. I am not able to perform a presplitting of the data, so
does bulk importing work without presplitting?
As I understand it, instead of putting the data, I create the hbase
region files, but all tutorials I read mentioned presplitting...
So, is presplitting essential for bulk importing?
It would be really helpful, if someone could point me to demo
implementation of a bulk import.
Thanks for helping
Michael