Re: Best technique for doing lookup with Secondary Index

anil gupta Thu, 25 Oct 2012 15:11:00 -0700

Anoop:  In prePut hook u call HTable#put()?
Anil: Yes i call HTable#put() in prePut. Is there better way of doing it?

Anoop: Why use the network calls from server side here then?
Anil: I thought this is a cleaner approach since i am using BulkLoader. I
decided not to run two jobs since i am generating a UniqueIdentifier at
runtime in bulkloader.

Anoop: can not handle it from client alone?
Anil: I cannot handle it from client since i am using BulkLoader. Is it a
good idea to create Htable instance on "B" and do put in my mapper? I might
try this idea.

Anoop: You can have a look at Lily project.
Anil: It's little late for us to evaluate Lily now and at present we dont
need complex secondary index since our data is immutable.

Ram: what is rowkey B here?
Anil: Suppose i am storing customer events in table A. I have two
requirement for data query:
1. Query customer events on basis of customer_Id and event_ID.
2. Query customer events on basis of event_timestamp and customer_ID.

70% of querying is done by query#1, so i will create
<customer_Id><event_ID> as row key of Table A.
Now, in order to support fast results for query#2, i need to create a
secondary index on A. I store that secondary index in B, rowkey of B is
<event_timestamp><customer_ID>  .Every row stores the corresponding rowkey
of A.

Ram:How is the startRow determined for every query?
Anil: Its determined by a very simple application logic.

Thanks,
Anil Gupta

On Wed, Oct 24, 2012 at 10:16 PM, Ramkrishna.S.Vasudevan <
ramkrishna.vasude...@huawei.com> wrote:

> Just out of curiosity,
> > The secondary index is stored in table "B" as rowkey B -->
> > family:<rowkey
> > A>
> what is rowkey B here?
> > 1. Scan the secondary table by using prefix filter and startRow.
> How is the startRow determined for every query ?
>
> Regards
> Ram
>
> > -----Original Message-----
> > From: Anoop Sam John [mailto:anoo...@huawei.com]
> > Sent: Thursday, October 25, 2012 10:15 AM
> > To: user@hbase.apache.org
> > Subject: RE: Best technique for doing lookup with Secondary Index
> >
> > >I build the secondary table "B" using a prePut RegionObserver.
> >
> > Anil,
> >        In prePut hook u call HTable#put()?  Why use the network calls
> > from server side here then? can not handle it from client alone? You
> > can have a look at Lily project.   Thoughts after seeing ur idea on put
> > and scan..
> >
> > -Anoop-
> > ________________________________________
> > From: anil gupta [anilgupt...@gmail.com]
> > Sent: Thursday, October 25, 2012 3:10 AM
> > To: user@hbase.apache.org
> > Subject: Best technique for doing lookup with Secondary Index
> >
> > Hi All,
> >
> > I am using HBase 0.92.1. I have created a secondary index on table "A".
> > Table A stores immutable data. I build the secondary table "B" using a
> > prePut RegionObserver.
> >
> > The secondary index is stored in table "B" as rowkey B -->
> > family:<rowkey
> > A>  . "<rowkey A>" is the column qualifier. Every row in B will only on
> > have one column and the name of that column is the rowkey of A. So the
> > value is blank. As per my understanding, accessing column qualifier is
> > faster than accessing value. Please correct me if i am wrong.
> >
> >
> > HBase Querying approach:
> > 1. Scan the secondary table by using prefix filter and startRow.
> > 2. Do a batch get on primary table by using HTable.get(List<Get>)
> > method.
> >
> > The above approach for retrieval works fine but i was wondering it
> > there is
> > a better approach. I was planning to try out doing the retrieval using
> > coprocessors.
> > Have anyone tried using coprocessors? I would appreciate if others can
> > share their experience with secondary index for HBase queries.
> >
> > --
> > Thanks & Regards,
> > Anil Gupta=
>
>

-- 
Thanks & Regards,
Anil Gupta

Re: Best technique for doing lookup with Secondary Index

Reply via email to