Re: HBase secondary index performance

2010-09-04 Thread Murali Krishna. P
Hi, Thanks for the detailed explanation, I liked the idea of timestamp check, this will be good enough for us and I can put a periodic MR cleaner. However I need some help in understanding the 30K number that was claimed. With the IndexedTable approach, I got only 1200rows/s (60rows/s X

Re: Stack assistance

2010-09-04 Thread Edward Capriolo
On Sun, Sep 5, 2010 at 12:27 AM, phil young wrote: > I'm interested in doing joins in Hive between HBase tables and between HBase > and Hive tables. > > Can someone suggest an appropriate stack to do that? i.e. > Is it possible to use HBase 0.89 > If I use HBase 0.20.6, do I still need to apply HB

Re: Please help me overcome HBase's weaknesses

2010-09-04 Thread Edward Capriolo
On Sun, Sep 5, 2010 at 12:07 AM, Jonathan Gray wrote: >> > But your boss seems rather to be criticizing the fact that our system >> > is made of components.  In software engineering, this is usually >> > considered a strength.  As to 'roles', one of the bigtable author's >> > argues that a cluster

Stack assistance

2010-09-04 Thread phil young
I'm interested in doing joins in Hive between HBase tables and between HBase and Hive tables. Can someone suggest an appropriate stack to do that? i.e. Is it possible to use HBase 0.89 If I use HBase 0.20.6, do I still need to apply HBASE-2473 Should I go with the trunk versions of any of these (e

Re: HBase table lost on upgrade

2010-09-04 Thread Ted Yu
The tool Stack mentioned is hbck. If you want to port it to 0.20, see email thread entitled: compiling HBaseFsck.java for 0.20.5You should try reducing the number of tables in your system, possibly through HBASE-2473 Cheers On Thu, Sep 2, 2010 at 11:41 AM, Sharma, Avani wrote: > > > > -Orig

RE: Please help me overcome HBase's weaknesses

2010-09-04 Thread Jonathan Gray
> > But your boss seems rather to be criticizing the fact that our system > > is made of components. In software engineering, this is usually > > considered a strength. As to 'roles', one of the bigtable author's > > argues that a cluster of master and slaves makes for simpler systems > > [1]. >

Re: Please help me overcome HBase's weaknesses

2010-09-04 Thread Ted Yu
MauMau: public void createTable(HTableDescriptor desc, byte [] startKey, byte [] endKey, int numRegions) If you choose HBase 0.20.6, please be aware that you need to apply HBASE-2473 yourself so that you can use the above API. On Sat, Sep 4, 2010 at 6:49 PM, MauMau wrote: > Hello, Stack, >

Re: Please help me overcome HBase's weaknesses

2010-09-04 Thread MauMau
Hello, Samuru, Thank you for your opinion. I love HBase's API, too. Cassandra's API and its data model (supercolumns) are complicated to us. - Original Message - From: "Samuru Jackson" I evaluated Cassandra and HBase for a particular problem domain and found that Cassandra is a hu

Re: Please help me overcome HBase's weaknesses

2010-09-04 Thread MauMau
Hello, Stack, Thank you for giving me advice. But your boss seems rather to be criticizing the fact that our system is made of components. In software engineering, this is usually considered a strength. As to 'roles', one of the bigtable author's argues that a cluster of master and slaves mak

Re: Please help me overcome HBase's weaknesses

2010-09-04 Thread MauMau
Hello, Jonathan, Thank you. I understood the situation. If you have a strong requirement of not being able to have data unavailable for more than one second, I think Cassandra would be a clear winner here. Is this a requirement just for reads, for writes, or both? Perhaps just for reads, bu

Re: Please help me overcome HBase's weaknesses

2010-09-04 Thread Samuru Jackson
Hi! I just want to add my personal opinion to this point: > (1) Ease of use > Cassandra does not require any other software. All nodes of Cassandra have > the same role. Pretty easy. > On the other hand, HBase requires HDFS and ZooKeeper. Users have to > manipulate and manage HDFS and ZooKeeper. T

Re: HBase secondary index performance

2010-09-04 Thread Samuru Jackson
Hi, > where key will be [value:key] and insert rows every time, when we insert > our values. We will got 30k rows/s/node. Could you specify on what kind of hardware you did this? How did you design your indexer? Is it multithreaded? /SJ --- http://uncinuscloud.blogspot.com/

Re: Please help me overcome HBase's weaknesses

2010-09-04 Thread Stack
2010/9/4 MauMau : > However, my boss points out the following as the weaknesses of HBase and > insists that we choose Cassandra. I prefer HBase because HBase has stronger > potential, thanks to its active community and rich ecosystem backed by the > membership of Hadoop family. Are there any good e

Re: HBase secondary index performance

2010-09-04 Thread Andrey Stepachev
2010/9/3 Murali Krishna. P : >        * custom indexing is good, but our data keeps changing every day. So, > probably > indextable is the best option for us In case of custom indexing you can use timestamps to check, that index record still valid. (or ever simply recheck existance of the value)

RE: Please help me overcome HBase's weaknesses

2010-09-04 Thread Jonathan Gray
Answers inline. > -Original Message- > From: MauMau [mailto:maumau...@gmail.com] > Sent: Saturday, September 04, 2010 9:31 AM > To: user@hbase.apache.org > Subject: Please help me overcome HBase's weaknesses > > Hello, > > We are considering which of HBase or Cassandra to choose for our

Re: HBase secondary index performance

2010-09-04 Thread Todd Lipcon
On Fri, Sep 3, 2010 at 7:57 AM, Michael Segel wrote: > > > > > Date: Fri, 3 Sep 2010 18:00:42 +0530 > > From: muralikpb...@yahoo.com > > Subject: Re: HBase secondary index performance > > To: user@hbase.apache.org > > > > Thanks Andrey, > > > > * Setting the autoflush to false and increasing

Re: HBase secondary index performance

2010-09-04 Thread Samuru Jackson
Hi, I'm not sure if I understand your problems completely, but relating to your update issue: HBase keeps versions of your columns. If you have an index on something that needs to be updated you just overwrite the value in the index. There is no need to remove things. I also organize my indexes

Please help me overcome HBase's weaknesses

2010-09-04 Thread MauMau
Hello, We are considering which of HBase or Cassandra to choose for our future projects. I'm recommending HBase to my boss and coworkers, because HBase is good both for analysis (MapReduce) and for OLTP (get/put provides relatively fast response). Cassandra is superior in get/put response time

Re: HBase secondary index performance

2010-09-04 Thread Murali Krishna. P
Thanks Samuru, I was reading about custom indexing in habse, just wanted to know how are we handling the updates incase of custom indexing. Probably if the original data doesn't change, it might be a good solution. Say, if one of the column value gets changed in the original table, we need