Lars, Todd, Thanks for the info. If I understand correctly, the importtsv command line tool will not compress by default and there is no command line switch for it, but I can modify the source at hbase-0.89.20100924+28/src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java to call FileOutputFormat.setCompressOutput/setOutputCompressorClass() on the Job; in order to turn on compression.
Does that sound right? Marc On Thu, Dec 23, 2010 at 2:34 PM, Todd Lipcon <[email protected]> wrote: > You beat me to it, Lars! Was writing a response when some family arrived > for > the holidays, and when I came back, you had written just what I had started > :) > > On Thu, Dec 23, 2010 at 1:51 PM, Lars George <[email protected]> > wrote: > > > live ones and then moved into place from their temp location. Not sure > > what happens if the local cluster has no /hbase etc. > > > > Todd, could you help here? > > > > Yep, there is a code path where if the HFiles are on a different > filesystem, > it will copy them to the HBase filesystem first. It's not very efficient, > though, so it's probably better to distcp them to the local cluster first. > > -Todd > -- > Todd Lipcon > Software Engineer, Cloudera >
