Thank you so much for the informative info. It really helps me out. For secondary index, even without transaction, I would think one could still build a secondary index on another key especially if we have row level locking. Correct me if I am wrong.
Also, I have read about clustered B-Tree used in InnoDB to implement secondary index but I know that B-Tree is the primary limitation when come to scalability and the main reason why NoSQL have discarded B-Tree. But it would be super nice to be able to build the secondary index without using another secondary table in HBase. I am not complaining but I would love to see HBase continues to be the top NoSQL solution out there :D Way to go HBase ! On Fri, Mar 25, 2011 at 10:39 AM, Buttler, David <buttl...@llnl.gov> wrote: > Do you know what it means to make secondary indexing a feature? There are > two reasonable outcomes: > 1) adding ACID semantics (and thus killing scalability) > 2) allowing the secondary index to be out of date (leading to every naïve > user claiming that there is a serious bug that must be fixed). > > Secondary indexes are basically another way of storing (part of) the data. > E.g. another table, sorted on the field(s) that you want to search on. In > order to ensure consistency between the primary table and the secondary > table (index), you have to guarantee that when you mutate the primary table > that the secondary table is mutated in the same atomic transaction. Since > HBase only has row-level locks, this can't be guaranteed across tables. > > The situation is not hopeless, because in many cases you don't need to have > perfectly consistent data and can afford to wait for cleanup tasks. For > some applications, you can ensure that the index is updated close enough to > the table update (using external transactions, or something similar) that > users would never notice. One way to implement an eventually consistent > secondary index would be to mimic the way cluster replication is done. > > However, what I have described is difficult to do generically -- and there > are engineering tradeoffs that need to be made. If you absolutely need a > transactional and consistent secondary index, I would suggest using Oracle, > MySQL, or another relational database, where this was designed in as a > primary feature. Just don't complain that they are too slow or don't scale > as well as HBase. > > </rant> > > Sorry for the rant. If you want to have a secondary index here is what you > need to do: > Modify your application so that every time you write to the primary table, > you also write to a secondary table, keyed off of the values you want to > search on. If you can't guarantee that the values form a secondary key > (i.e. are unique across your entire table), you can make your key a compound > key (see, for example, how "tsuna" designed OpenTSDB) with your primary key > as a component. > > Then, when you need to query, you can do range queries over the secondary > table to retrieve the keys in the primary table to return the full data row. > > Dave > > -----Original Message----- > From: Wei Shung Chung [mailto:weish...@gmail.com] > Sent: Friday, March 25, 2011 12:04 AM > To: user@hbase.apache.org > Subject: Re: Stargate+hbase > > I need to use secondary indexing too, hopefully this important feature > will be made available soon :) > > Sent from my iPhone > > On Mar 25, 2011, at 12:48 AM, Stack <st...@duboce.net> wrote: > > > There is no native support for secondary indices in HBase (currently). > > You will have to manage it yourself. > > St.Ack > > > > On Thu, Mar 24, 2011 at 10:47 PM, sreejith P. K. <sreejit...@nesote.com > > > wrote: > >> I have tried secondary indexing. It seems I miss some points. Could > >> you > >> please explain how it is possible using secondary indexing? > >> > >> > >> I have tried like, > >> > >> > >> Columnamilty1:kwd1 > >> Columnamilty1:kwd2 > >> row1 Columnamilty1:kwd3 > >> Columnamilty1:kwd2 > >> > >> Columnamilty1:kwd1 > >> Columnamilty1:kwd2 > >> row2 Columnamilty1:kwd4 > >> Columnamilty1:kwd5 > >> > >> > >> I need to get all rows which contain kwd1 and kwd2 > >> > >> Please help. > >> Thanks > >> > >> > >> On Thu, Mar 24, 2011 at 9:57 PM, Jean-Daniel Cryans < > jdcry...@apache.org > >> >wrote: > >> > >>> What you are asking for is a secondary index, and it doesn't exist > >>> at > >>> the moment in HBase (let alone REST). Googling a bit for "hbase > >>> secondary indexing" will show you how people usually do it. > >>> > >>> J-D > >>> > >>> On Thu, Mar 24, 2011 at 6:18 AM, sreejith P. K. <sreejit...@nesote.com > >>> > > >>> wrote: > >>>> Is it possible using stargate interface to hbase, fetch all rows > >>>> where > >>> more > >>>> than one column family:<qualifier> must be present? > >>>> > >>>> like :select rows which contains keyword:a and keyword:b ? > >>>> > >>>> Thanks > >>>> > >>> > >> > >> > >> > >> -- > >> Sreejith PK > >> Nesote Technologies (P) Ltd > >> >