Hi Nanheng, It sounds like you're on the right path. It sounds like you're missing the "commit" step when using the output format.
The layout of the output dir should look something like: output/ output/colfam/ output/colfam/234923423 output/colfam/349593453 <-- these are just unique IDs Thanks -Todd On Wed, Jan 5, 2011 at 3:54 PM, Nanheng Wu <[email protected]> wrote: > Hi, > > I am new to HBase and Hadoop and I am trying to find the best way to > bulk load a table from HDFS to HBase. I don't mind creating a new > table for each batch and what I understand using HFileOutputFormat > directly in a MR job is the most efficient method. My input data set > is already in sorted order, it seems to me that I don't need to use > reducers, which require me to do a globally sort already sorted data. > I tried to use HFileOutputFormat.getRecordWriter in my mapper and 0 > reducers but the output directory has a only a _temporary directory > with my outputs in each subdirectory. That doesn't seem be be what the > loadtable script expects (a column family directory with HFiles). Can > someone tell me if what I am doing makes sense in general or how to do > this properly? Thanks! > -- Todd Lipcon Software Engineer, Cloudera
