Hi there, See this section of the HBase RefGuide for information about bulk loading.
http://hbase.apache.org/book.html#arch.bulk.load On 1/18/13 12:57 PM, "praveenesh kumar" <praveen...@gmail.com> wrote: >Hey, >Can someone throw some pointers on what would be the best practice for >bulk >imports in hbase ? >That would be really helpful. > >Regards, >Praveenesh > >On Thu, Jan 17, 2013 at 11:16 PM, Mohammad Tariq <donta...@gmail.com> >wrote: > >> Just to add to whatever all the heavyweights have said above, your MR >>job >> may not be as efficient as the MR job corresponding to your Hive query. >>You >> can enhance the performance by setting the mapred config parameters >>wisely >> and by tuning your MR job. >> >> Warm Regards, >> Tariq >> https://mtariq.jux.com/ >> cloudfront.blogspot.com >> >> >> On Thu, Jan 17, 2013 at 10:39 PM, ramkrishna vasudevan < >> ramkrishna.s.vasude...@gmail.com> wrote: >> >> > Hive is more for batch and HBase is for more of real time data. >> > >> > Regards >> > Ram >> > >> > On Thu, Jan 17, 2013 at 10:30 PM, Anoop John <anoop.hb...@gmail.com> >> > wrote: >> > >> > > In case of Hive data insertion means placing the file under table >>path >> in >> > > HDFS. HBase need to read the data and convert it into its format. >> > (HFiles) >> > > MR is doing this work.. So this makes it clear that HBase will be >> > slower. >> > > :) As Michael said the read operation... >> > > >> > > >> > > >> > > -Anoop- >> > > >> > > On Thu, Jan 17, 2013 at 10:14 PM, Austin Chungath >><austi...@gmail.com >> > > >wrote: >> > > >> > > > Hi, >> > > > Problem: hive took 6 mins to load a data set, hbase took 1 hr 14 >> mins. >> > > > It's a 20 gb data set approx 230 million records. The data is in >> hdfs, >> > > > single text file. The cluster is 11 nodes, 8 cores. >> > > > >> > > > I loaded this in hive, partitioned by date and bucketed into 32 >>and >> > > sorted. >> > > > Time taken is 6 mins. >> > > > >> > > > I loaded the same data into hbase, in the same cluster by writing >>a >> map >> > > > reduce code. It took 1hr 14 mins. The cluster wasn't running >>anything >> > > else >> > > > and assuming that the code that i wrote is good enough, what is it >> that >> > > > makes hbase slower than hive in loading the data? >> > > > >> > > > Thanks, >> > > > Austin >> > > > >> > > >> > >>