Hi Anil, I am using HBase 0.94.0 with Hadoop 1.0.0. The directories are indeed the ones mentioned my Bijeet. I can also add that I am doing the 2nd stage programatically by calling doBulkLoad(org.apache.hadoop.fs.Path sourceDir, HTable table) on a LoadIncrementalHFiles object.
Best, Sever On Fri, Jul 27, 2012 at 5:40 AM, Anil Gupta <anilgupt...@gmail.com> wrote: > Hi Sever, > > That's a very interesting thing. Which Hadoop and hbase version you are > using? I am going to run bulk loads tomorrow. If you can tell me which > directories in hdfs you compared with /hbase/$table then I will try to check > the same. > > Best Regards, > Anil > > On Jul 26, 2012, at 3:46 PM, Sever Fundatureanu > <fundatureanu.se...@gmail.com> wrote: > >> On Thu, Jul 26, 2012 at 6:47 PM, Sateesh Lakkarsu <lakka...@gmail.com> wrote: >>>> >>>> >>>> For the bulkloading process, the HBase documentation mentions that in >>>> a 2nd stage "the appropriate Region Server adopts the HFile, moving it >>>> into its storage directory and making the data available to clients." >>>> But from my experience the files also remain in the original location >>>> from where they are "adopted". So I guess the data is actually copied >>>> into the HBase directory right? This means that, compared to the >>>> online importing, when bulk loading you essentially need twice the >>>> disk space on HDFS, right? >>>> >>> >>> Yes, if you are generating HFiles on one cluster and loading into a >>> separate hbase cluster. If they are co-located, its just a hdfs mv. >> >> Hmm, both the HFile generation and the HBase cluster runs on top of >> the same HDFS cluster. I did a "du" on both the source HDFS directory >> and the destination "/hbase" directory and I got the same sizes (+- >> few bytes). I deleted the source directory from HDFS and then scanned >> the table without any problems. Maybe there is a config parameter I'm >> missing? >> >> Sever -- Sever Fundatureanu Vrije Universiteit Amsterdam E-mail: fundatureanu.se...@gmail.com