Re: Loading data, hbase slower than Hive?

2013-01-20 Thread Mohammad Tariq
of table. > > -Anoop- > > From: Mohammad Tariq [donta...@gmail.com] > Sent: Monday, January 21, 2013 12:01 PM > To: user@hbase.apache.org > Subject: Re: Loading data, hbase slower than Hive? > > Apart from this you can have some additional tweaks

RE: Loading data, hbase slower than Hive?

2013-01-20 Thread Anoop Sam John
that of the table regions. So Austin you can check with proper presplit of table. -Anoop- From: Mohammad Tariq [donta...@gmail.com] Sent: Monday, January 21, 2013 12:01 PM To: user@hbase.apache.org Subject: Re: Loading data, hbase slower than Hive? Apart

Re: Loading data, hbase slower than Hive?

2013-01-20 Thread Mohammad Tariq
ng HFileOutputFormat or TableOutputFormat? > > > > -Anoop- > > > > From: Austin Chungath [austi...@gmail.com] > > Sent: Monday, January 21, 2013 11:15 AM > > To: user@hbase.apache.org > > Subject: Re: Loading data, hbase slower than Hive? &

Re: Loading data, hbase slower than Hive?

2013-01-20 Thread Austin Chungath
m: Austin Chungath [austi...@gmail.com] > Sent: Monday, January 21, 2013 11:15 AM > To: user@hbase.apache.org > Subject: Re: Loading data, hbase slower than Hive? > > Thank you Tariq. > I will let you know how things went after I implement these suggestions. > > Regards, &g

RE: Loading data, hbase slower than Hive?

2013-01-20 Thread Anoop Sam John
Austin, You are using HFileOutputFormat or TableOutputFormat? -Anoop- From: Austin Chungath [austi...@gmail.com] Sent: Monday, January 21, 2013 11:15 AM To: user@hbase.apache.org Subject: Re: Loading data, hbase slower than Hive? Thank you Tariq

Re: Loading data, hbase slower than Hive?

2013-01-20 Thread Austin Chungath
Thank you Tariq. I will let you know how things went after I implement these suggestions. Regards, Austin On Sun, Jan 20, 2013 at 2:42 AM, Mohammad Tariq wrote: > Hello Austin, > > I am sorry for the late response. > > Asaf has made a very valid point. Rowkwey design is very crucial.

Re: Loading data, hbase slower than Hive?

2013-01-20 Thread Vikas Jadhav
According to me HBase need to store more metadata than hive (For each value it stores seperately row key , col_family ,col_name,value) and file size of original hdfs file may increase in size I also wondered this if anyone has got better result for hbase than hive let us know. Thank You On Sun

Re: Loading data, hbase slower than Hive?

2013-01-20 Thread Doug Meil
Hi there- On top of what everybody else said, for more info on rowkey design and pre-splitting see http://hbase.apache.org/book.html#schema (as well as other threads in this dist-list on that topic). On 1/19/13 4:12 PM, "Mohammad Tariq" wrote: >Hello Austin, > > I am sorry for the

Re: Loading data, hbase slower than Hive?

2013-01-19 Thread Mohammad Tariq
Hello Austin, I am sorry for the late response. Asaf has made a very valid point. Rowkwey design is very crucial. Specially if the data is gonna be sequential(timeseries kinda thing). You may end up with hotspotting problem. Use pre-splitted tables or hash the keys to avoid that. It'll

Re: Loading data, hbase slower than Hive?

2013-01-19 Thread Asaf Mesika
Start by telling us your row key design. Check for pre splitting your table regions. I managed to get to 25mb/sec write throughput in Hbase using 1 region server. If your data is evenly spread you can get around 7 times that in a 10 regions server environment. Should mean that 1 gig should take 4 s

Re: Loading data, hbase slower than Hive?

2013-01-18 Thread Doug Meil
Hi there, See this section of the HBase RefGuide for information about bulk loading. http://hbase.apache.org/book.html#arch.bulk.load On 1/18/13 12:57 PM, "praveenesh kumar" wrote: >Hey, >Can someone throw some pointers on what would be the best practice for >bulk >imports in hbase ? >Th

Re: Loading data, hbase slower than Hive?

2013-01-18 Thread praveenesh kumar
Hey, Can someone throw some pointers on what would be the best practice for bulk imports in hbase ? That would be really helpful. Regards, Praveenesh On Thu, Jan 17, 2013 at 11:16 PM, Mohammad Tariq wrote: > Just to add to whatever all the heavyweights have said above, your MR job > may not be

Re: Loading data, hbase slower than Hive?

2013-01-17 Thread Mohammad Tariq
Just to add to whatever all the heavyweights have said above, your MR job may not be as efficient as the MR job corresponding to your Hive query. You can enhance the performance by setting the mapred config parameters wisely and by tuning your MR job. Warm Regards, Tariq https://mtariq.jux.com/ cl

Re: Loading data, hbase slower than Hive?

2013-01-17 Thread ramkrishna vasudevan
Hive is more for batch and HBase is for more of real time data. Regards Ram On Thu, Jan 17, 2013 at 10:30 PM, Anoop John wrote: > In case of Hive data insertion means placing the file under table path in > HDFS. HBase need to read the data and convert it into its format. (HFiles) > MR is doing

Re: Loading data, hbase slower than Hive?

2013-01-17 Thread Anoop John
In case of Hive data insertion means placing the file under table path in HDFS. HBase need to read the data and convert it into its format. (HFiles) MR is doing this work.. So this makes it clear that HBase will be slower. :) As Michael said the read operation... -Anoop- On Thu, Jan 17, 2013

Re: Loading data, hbase slower than Hive?

2013-01-17 Thread Michael Segel
The writes take longer in HBase. Just how much longer may depend on how well you tuned HBase. Now, having said that... suppose you want to find a single record in either HBase or Hive. Which do you think will be faster? ;-) On Jan 17, 2013, at 10:44 AM, Austin Chungath wrote: > Hi, > Pr

Loading data, hbase slower than Hive?

2013-01-17 Thread Austin Chungath
Hi, Problem: hive took 6 mins to load a data set, hbase took 1 hr 14 mins. It's a 20 gb data set approx 230 million records. The data is in hdfs, single text file. The cluster is 11 nodes, 8 cores. I loaded this in hive, partitioned by date and bucketed into 32 and sorted. Time taken is 6 mins.