Do you know what it means to make secondary indexing a feature?  There are two 
reasonable outcomes:
1) adding ACID semantics (and thus killing scalability)
2) allowing the secondary index to be out of date (leading to every naïve user 
claiming that there is a serious bug that must be fixed).

Secondary indexes are basically another way of storing (part of) the data.  
E.g. another table, sorted on the field(s) that you want to search on.  In 
order to ensure consistency between the primary table and the secondary table 
(index), you have to guarantee that when you mutate the primary table that the 
secondary table is mutated in the same atomic transaction.  Since HBase only 
has row-level locks, this can't be guaranteed across tables.

The situation is not hopeless, because in many cases you don't need to have 
perfectly consistent data and can afford to wait for cleanup tasks.  For some 
applications, you can ensure that the index is updated close enough to the 
table update (using external transactions, or something similar) that users 
would never notice.  One way to implement an eventually consistent secondary 
index would be to mimic the way cluster replication is done.

However, what  I have described is difficult to do generically -- and there are 
engineering tradeoffs that need to be made.  If you absolutely need a 
transactional and consistent secondary index, I would suggest using Oracle, 
MySQL, or another relational database, where this was designed in as a primary 
feature.  Just don't complain that they are too slow or don't scale as well as 
HBase.

</rant>

Sorry for the rant.  If you want to have a secondary index here is what you 
need to do:
Modify your application so that every time you write to the primary table, you 
also write to a secondary table, keyed off of the values you want to search on. 
 If you can't guarantee that the values form a secondary key (i.e. are unique 
across your entire table), you can make your key a compound key (see, for 
example, how "tsuna" designed OpenTSDB) with your primary key as a component.

Then, when you need to query, you can do range queries over the secondary table 
to retrieve the keys in the primary table to return the full data row.

Dave

-----Original Message-----
From: Wei Shung Chung [mailto:weish...@gmail.com] 
Sent: Friday, March 25, 2011 12:04 AM
To: user@hbase.apache.org
Subject: Re: Stargate+hbase

I need to use secondary indexing too, hopefully this important feature  
will be made available soon :)

Sent from my iPhone

On Mar 25, 2011, at 12:48 AM, Stack <st...@duboce.net> wrote:

> There is no native support for secondary indices in HBase (currently).
> You will have to manage it yourself.
> St.Ack
>
> On Thu, Mar 24, 2011 at 10:47 PM, sreejith P. K. <sreejit...@nesote.com 
> > wrote:
>> I have tried secondary indexing. It seems I miss some points. Could  
>> you
>> please explain how it is possible using secondary indexing?
>>
>>
>> I have tried like,
>>
>>
>>                Columnamilty1:kwd1
>>                Columnamilty1:kwd2
>> row1         Columnamilty1:kwd3
>>                Columnamilty1:kwd2
>>
>>                Columnamilty1:kwd1
>>                Columnamilty1:kwd2
>> row2         Columnamilty1:kwd4
>>                Columnamilty1:kwd5
>>
>>
>> I need to get all rows which contain kwd1 and kwd2
>>
>> Please help.
>> Thanks
>>
>>
>> On Thu, Mar 24, 2011 at 9:57 PM, Jean-Daniel Cryans <jdcry...@apache.org 
>> >wrote:
>>
>>> What you are asking for is a secondary index, and it doesn't exist  
>>> at
>>> the moment in HBase (let alone REST). Googling a bit for "hbase
>>> secondary indexing" will show you how people usually do it.
>>>
>>> J-D
>>>
>>> On Thu, Mar 24, 2011 at 6:18 AM, sreejith P. K. <sreejit...@nesote.com 
>>> >
>>> wrote:
>>>> Is it possible using stargate interface to hbase,  fetch all rows  
>>>> where
>>> more
>>>> than one column family:<qualifier> must be present?
>>>>
>>>> like :select  rows which contains keyword:a and keyword:b ?
>>>>
>>>> Thanks
>>>>
>>>
>>
>>
>>
>> --
>> Sreejith PK
>> Nesote Technologies (P) Ltd
>>

Reply via email to