Hive is more for batch and HBase is for more of real time data. Regards Ram
On Thu, Jan 17, 2013 at 10:30 PM, Anoop John <anoop.hb...@gmail.com> wrote: > In case of Hive data insertion means placing the file under table path in > HDFS. HBase need to read the data and convert it into its format. (HFiles) > MR is doing this work.. So this makes it clear that HBase will be slower. > :) As Michael said the read operation... > > > > -Anoop- > > On Thu, Jan 17, 2013 at 10:14 PM, Austin Chungath <austi...@gmail.com > >wrote: > > > Hi, > > Problem: hive took 6 mins to load a data set, hbase took 1 hr 14 mins. > > It's a 20 gb data set approx 230 million records. The data is in hdfs, > > single text file. The cluster is 11 nodes, 8 cores. > > > > I loaded this in hive, partitioned by date and bucketed into 32 and > sorted. > > Time taken is 6 mins. > > > > I loaded the same data into hbase, in the same cluster by writing a map > > reduce code. It took 1hr 14 mins. The cluster wasn't running anything > else > > and assuming that the code that i wrote is good enough, what is it that > > makes hbase slower than hive in loading the data? > > > > Thanks, > > Austin > > >