Do you know what it means to make secondary indexing a feature? There are two reasonable outcomes: 1) adding ACID semantics (and thus killing scalability) 2) allowing the secondary index to be out of date (leading to every naïve user claiming that there is a serious bug that must be fixed).
Secondary indexes are basically another way of storing (part of) the data. E.g. another table, sorted on the field(s) that you want to search on. In order to ensure consistency between the primary table and the secondary table (index), you have to guarantee that when you mutate the primary table that the secondary table is mutated in the same atomic transaction. Since HBase only has row-level locks, this can't be guaranteed across tables. The situation is not hopeless, because in many cases you don't need to have perfectly consistent data and can afford to wait for cleanup tasks. For some applications, you can ensure that the index is updated close enough to the table update (using external transactions, or something similar) that users would never notice. One way to implement an eventually consistent secondary index would be to mimic the way cluster replication is done. However, what I have described is difficult to do generically -- and there are engineering tradeoffs that need to be made. If you absolutely need a transactional and consistent secondary index, I would suggest using Oracle, MySQL, or another relational database, where this was designed in as a primary feature. Just don't complain that they are too slow or don't scale as well as HBase. </rant> Sorry for the rant. If you want to have a secondary index here is what you need to do: Modify your application so that every time you write to the primary table, you also write to a secondary table, keyed off of the values you want to search on. If you can't guarantee that the values form a secondary key (i.e. are unique across your entire table), you can make your key a compound key (see, for example, how "tsuna" designed OpenTSDB) with your primary key as a component. Then, when you need to query, you can do range queries over the secondary table to retrieve the keys in the primary table to return the full data row. Dave -----Original Message----- From: Wei Shung Chung [mailto:weish...@gmail.com] Sent: Friday, March 25, 2011 12:04 AM To: user@hbase.apache.org Subject: Re: Stargate+hbase I need to use secondary indexing too, hopefully this important feature will be made available soon :) Sent from my iPhone On Mar 25, 2011, at 12:48 AM, Stack <st...@duboce.net> wrote: > There is no native support for secondary indices in HBase (currently). > You will have to manage it yourself. > St.Ack > > On Thu, Mar 24, 2011 at 10:47 PM, sreejith P. K. <sreejit...@nesote.com > > wrote: >> I have tried secondary indexing. It seems I miss some points. Could >> you >> please explain how it is possible using secondary indexing? >> >> >> I have tried like, >> >> >> Columnamilty1:kwd1 >> Columnamilty1:kwd2 >> row1 Columnamilty1:kwd3 >> Columnamilty1:kwd2 >> >> Columnamilty1:kwd1 >> Columnamilty1:kwd2 >> row2 Columnamilty1:kwd4 >> Columnamilty1:kwd5 >> >> >> I need to get all rows which contain kwd1 and kwd2 >> >> Please help. >> Thanks >> >> >> On Thu, Mar 24, 2011 at 9:57 PM, Jean-Daniel Cryans <jdcry...@apache.org >> >wrote: >> >>> What you are asking for is a secondary index, and it doesn't exist >>> at >>> the moment in HBase (let alone REST). Googling a bit for "hbase >>> secondary indexing" will show you how people usually do it. >>> >>> J-D >>> >>> On Thu, Mar 24, 2011 at 6:18 AM, sreejith P. K. <sreejit...@nesote.com >>> > >>> wrote: >>>> Is it possible using stargate interface to hbase, fetch all rows >>>> where >>> more >>>> than one column family:<qualifier> must be present? >>>> >>>> like :select rows which contains keyword:a and keyword:b ? >>>> >>>> Thanks >>>> >>> >> >> >> >> -- >> Sreejith PK >> Nesote Technologies (P) Ltd >>