Anil, The two directories in question here are -
1. the HDFS location where the MapReduce job creates the HFiles 2. the directory pointed to by hbase.rootdir in your HBase configuration - the default value is /hbase. Inside the HBase root directory, there are per-table subdirectories. So for the kind of comparison that you mentioned, you need to look in the directory <hbase.rootdir>/<table-name> and the directory where you are creating the HFiles. BIjeet On Fri, Jul 27, 2012 at 9:10 AM, Anil Gupta <anilgupt...@gmail.com> wrote: > Hi Sever, > > That's a very interesting thing. Which Hadoop and hbase version you are > using? I am going to run bulk loads tomorrow. If you can tell me which > directories in hdfs you compared with /hbase/$table then I will try to > check the same. > > Best Regards, > Anil > > On Jul 26, 2012, at 3:46 PM, Sever Fundatureanu < > fundatureanu.se...@gmail.com> wrote: > > > On Thu, Jul 26, 2012 at 6:47 PM, Sateesh Lakkarsu <lakka...@gmail.com> > wrote: > >>> > >>> > >>> For the bulkloading process, the HBase documentation mentions that in > >>> a 2nd stage "the appropriate Region Server adopts the HFile, moving it > >>> into its storage directory and making the data available to clients." > >>> But from my experience the files also remain in the original location > >>> from where they are "adopted". So I guess the data is actually copied > >>> into the HBase directory right? This means that, compared to the > >>> online importing, when bulk loading you essentially need twice the > >>> disk space on HDFS, right? > >>> > >> > >> Yes, if you are generating HFiles on one cluster and loading into a > >> separate hbase cluster. If they are co-located, its just a hdfs mv. > > > > Hmm, both the HFile generation and the HBase cluster runs on top of > > the same HDFS cluster. I did a "du" on both the source HDFS directory > > and the destination "/hbase" directory and I got the same sizes (+- > > few bytes). I deleted the source directory from HDFS and then scanned > > the table without any problems. Maybe there is a config parameter I'm > > missing? > > > > Sever >