Hi Nanheng,

It sounds like you're on the right path. It sounds like you're missing the
"commit" step when using the output format.

The layout of the output dir should look something like:
output/
output/colfam/
output/colfam/234923423
output/colfam/349593453  <-- these are just unique IDs

Thanks
-Todd



On Wed, Jan 5, 2011 at 3:54 PM, Nanheng Wu <[email protected]> wrote:

> Hi,
>
>  I am new to HBase and Hadoop and I am trying to find the best way to
> bulk load a table from HDFS to HBase. I don't mind creating a new
> table for each batch and what I understand using HFileOutputFormat
> directly in a MR job is the most efficient method. My input data set
> is already in sorted order, it seems to me that I don't need to use
> reducers, which require me to do a globally sort already sorted data.
> I tried to use HFileOutputFormat.getRecordWriter in my mapper and 0
> reducers but the output directory has a only a _temporary directory
> with my outputs in each subdirectory. That doesn't seem be be what the
> loadtable script expects  (a column family directory with HFiles). Can
> someone tell me if what I am doing makes sense in general or how to do
> this properly? Thanks!
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to