Anil,

The two directories in question here are -

  1.  the HDFS location where the MapReduce job creates the HFiles
  2.  the directory pointed to by hbase.rootdir in your HBase configuration
- the default value is /hbase. Inside the
       HBase root directory, there are per-table subdirectories.

So for the kind of comparison that you mentioned, you need to look in the
directory <hbase.rootdir>/<table-name> and the
directory where you are creating the HFiles.

BIjeet



On Fri, Jul 27, 2012 at 9:10 AM, Anil Gupta <anilgupt...@gmail.com> wrote:

> Hi Sever,
>
> That's a very interesting thing. Which Hadoop and hbase version you are
> using? I am going to run bulk loads tomorrow. If you can tell me which
> directories in hdfs you compared with /hbase/$table then I will try to
> check the same.
>
> Best Regards,
> Anil
>
> On Jul 26, 2012, at 3:46 PM, Sever Fundatureanu <
> fundatureanu.se...@gmail.com> wrote:
>
> > On Thu, Jul 26, 2012 at 6:47 PM, Sateesh Lakkarsu <lakka...@gmail.com>
> wrote:
> >>>
> >>>
> >>> For the bulkloading process, the HBase documentation mentions that in
> >>> a 2nd stage "the appropriate Region Server adopts the HFile, moving it
> >>> into its storage directory and making the data available to clients."
> >>> But from my experience the files also remain in the original location
> >>> from where they are "adopted". So I guess the data is actually copied
> >>> into the HBase directory right? This means that, compared to the
> >>> online importing, when bulk loading you essentially need twice the
> >>> disk space on HDFS, right?
> >>>
> >>
> >> Yes, if you are generating HFiles on one cluster and loading into a
> >> separate hbase cluster. If they are co-located, its just a hdfs mv.
> >
> > Hmm, both the HFile generation and the HBase cluster runs on top of
> > the same HDFS cluster. I did a "du" on both the source HDFS directory
> > and the destination "/hbase" directory and I got the same sizes (+-
> > few bytes). I deleted the source directory from HDFS and then scanned
> > the table without any problems. Maybe there is a config parameter I'm
> > missing?
> >
> > Sever
>

Reply via email to