Re: Bulk import tools for HBase

Todd Lipcon Wed, 13 Oct 2010 05:40:37 -0700

On Mon, Oct 11, 2010 at 9:33 PM, Sean Bigdatafun
<[email protected]> wrote:
> Another potential "problem" of incremental bulk loader is that the number of
> reducers (for the bulk loading process) needs to be equal to the existing
> regions -- this seems to be unfeasible for very large table, say with 2000
> regions.
>
> Any comment on this? Thanks.


Yes, this is currently problematic if you have a very large table
(2000 regions) and a small MR cluster (where 2000 reducers is too
many).

It wouldn't be too difficult to amend the code so that each reducer is
responsible for a contiguous range of regions, and knows the split the
HFiles at region boundaries. Patches welcome :)

-Todd

>
> Sean
>
> On Fri, Oct 8, 2010 at 9:03 PM, Todd Lipcon <[email protected]> wrote:
>
>> What version are you building from? These tools are new as of this past
>> june.
>>
>> -Todd
>>
>> On Fri, Oct 8, 2010 at 4:52 PM, Leo Alekseyev <[email protected]> wrote:
>>
>>  > We want to investigate HBase bulk imports, as described on
>> > http://hbase.apache.org/docs/r0.89.20100726/bulk-loads.html and and/or
>> > JIRA HBASE-48.  I can't seem to run either the importtsv tool or the
>> > completebulkload tool using the hadoop jar /path/to/hbase-VERSION.jar
>> > command.  In fact, the ImportTsv class is not part of that jar file.
>> > Am I looking in the wrong place for this class, or do I need to
>> > somehow customize the build process to include it?..  Our HBase was
>> > built from source using the default procedure.
>> >
>> > Thanks for any insight,
>> > --Leo
>> >
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Bulk import tools for HBase

Reply via email to