Re: Bulk load using HFileOutputFormat.RecordWriter

Todd Lipcon Wed, 05 Jan 2011 18:49:22 -0800

Hi Nanheng,

It sounds like you're on the right path. It sounds like you're missing the
"commit" step when using the output format.


The layout of the output dir should look something like:
output/
output/colfam/
output/colfam/234923423
output/colfam/349593453  <-- these are just unique IDs

Thanks
-Todd



On Wed, Jan 5, 2011 at 3:54 PM, Nanheng Wu <[email protected]> wrote:

> Hi,
>
>  I am new to HBase and Hadoop and I am trying to find the best way to
> bulk load a table from HDFS to HBase. I don't mind creating a new
> table for each batch and what I understand using HFileOutputFormat
> directly in a MR job is the most efficient method. My input data set
> is already in sorted order, it seems to me that I don't need to use
> reducers, which require me to do a globally sort already sorted data.
> I tried to use HFileOutputFormat.getRecordWriter in my mapper and 0
> reducers but the output directory has a only a _temporary directory
> with my outputs in each subdirectory. That doesn't seem be be what the
> loadtable script expects  (a column family directory with HFiles). Can
> someone tell me if what I am doing makes sense in general or how to do
> this properly? Thanks!
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Bulk load using HFileOutputFormat.RecordWriter

Reply via email to