Re: BulkLoad 200GB table with one region. Is it OK?

Serega Sheypak Thu, 02 Oct 2014 12:27:07 -0700

Sorry, massive IO.
This table is read-only. So hbase should just place reference files, why
Hbase would rewrite the files?


2014-10-02 23:24 GMT+04:00 Serega Sheypak <[email protected]>:

> Hi!
> http://hortonworks.com/blog/apache-hbase-region-splitting-and-merging/
> Says that splitting is just a placing 'reference' file.
> Why there sould be massive splitting?
>
> 2014-10-02 23:08 GMT+04:00 Jean-Marc Spaggiari <[email protected]>:
>
>> Hi Serega,
>>
>> Bulk load just "push" the file into an HBase region, so there should not
>> be
>> any issue. Split however might take some time because HBase will have to
>> split it again and again util it become small enough. So if you max file
>> size is 10GB, it will split it to 100GB then 50GB then 25GB then 12GB then
>> 6GB... Each time, everything will be re-written. a LOT of wasted IOs.
>>
>> So response is: Yes, HBase can handle BUT it's not a good practice. Better
>> to split the table before and generate the bulk based on the splited
>> regions. Also, it might affect the others tables and the performances
>> because HBase will have to do massive IOs, which at the end might impact
>> the performances.
>>
>> JM
>>
>> 2014-10-02 15:03 GMT-04:00 Serega Sheypak <[email protected]>:
>>
>> > Hi, I'm doing HBase bulk load to an empty table.
>> > Input data size is 200GB
>> > Is it OK to load data into one default region and then wait while HBase
>> > splits 200GB region?
>> >
>> > I don't have any SLA for initial load. I can wait unitl HBase splits
>> > initial load files.
>> > This table is READ only.
>> >
>> > The only conideration is not affect others tables and do not cause HBase
>> > cluster degradation.
>> >
>>
>
>

Re: BulkLoad 200GB table with one region. Is it OK?

Reply via email to