Anoop: In prePut hook u call HTable#put()? Anil: Yes i call HTable#put() in prePut. Is there better way of doing it?
Anoop: Why use the network calls from server side here then? Anil: I thought this is a cleaner approach since i am using BulkLoader. I decided not to run two jobs since i am generating a UniqueIdentifier at runtime in bulkloader. Anoop: can not handle it from client alone? Anil: I cannot handle it from client since i am using BulkLoader. Is it a good idea to create Htable instance on "B" and do put in my mapper? I might try this idea. Anoop: You can have a look at Lily project. Anil: It's little late for us to evaluate Lily now and at present we dont need complex secondary index since our data is immutable. Ram: what is rowkey B here? Anil: Suppose i am storing customer events in table A. I have two requirement for data query: 1. Query customer events on basis of customer_Id and event_ID. 2. Query customer events on basis of event_timestamp and customer_ID. 70% of querying is done by query#1, so i will create <customer_Id><event_ID> as row key of Table A. Now, in order to support fast results for query#2, i need to create a secondary index on A. I store that secondary index in B, rowkey of B is <event_timestamp><customer_ID> .Every row stores the corresponding rowkey of A. Ram:How is the startRow determined for every query? Anil: Its determined by a very simple application logic. Thanks, Anil Gupta On Wed, Oct 24, 2012 at 10:16 PM, Ramkrishna.S.Vasudevan < ramkrishna.vasude...@huawei.com> wrote: > Just out of curiosity, > > The secondary index is stored in table "B" as rowkey B --> > > family:<rowkey > > A> > what is rowkey B here? > > 1. Scan the secondary table by using prefix filter and startRow. > How is the startRow determined for every query ? > > Regards > Ram > > > -----Original Message----- > > From: Anoop Sam John [mailto:anoo...@huawei.com] > > Sent: Thursday, October 25, 2012 10:15 AM > > To: user@hbase.apache.org > > Subject: RE: Best technique for doing lookup with Secondary Index > > > > >I build the secondary table "B" using a prePut RegionObserver. > > > > Anil, > > In prePut hook u call HTable#put()? Why use the network calls > > from server side here then? can not handle it from client alone? You > > can have a look at Lily project. Thoughts after seeing ur idea on put > > and scan.. > > > > -Anoop- > > ________________________________________ > > From: anil gupta [anilgupt...@gmail.com] > > Sent: Thursday, October 25, 2012 3:10 AM > > To: user@hbase.apache.org > > Subject: Best technique for doing lookup with Secondary Index > > > > Hi All, > > > > I am using HBase 0.92.1. I have created a secondary index on table "A". > > Table A stores immutable data. I build the secondary table "B" using a > > prePut RegionObserver. > > > > The secondary index is stored in table "B" as rowkey B --> > > family:<rowkey > > A> . "<rowkey A>" is the column qualifier. Every row in B will only on > > have one column and the name of that column is the rowkey of A. So the > > value is blank. As per my understanding, accessing column qualifier is > > faster than accessing value. Please correct me if i am wrong. > > > > > > HBase Querying approach: > > 1. Scan the secondary table by using prefix filter and startRow. > > 2. Do a batch get on primary table by using HTable.get(List<Get>) > > method. > > > > The above approach for retrieval works fine but i was wondering it > > there is > > a better approach. I was planning to try out doing the retrieval using > > coprocessors. > > Have anyone tried using coprocessors? I would appreciate if others can > > share their experience with secondary index for HBase queries. > > > > -- > > Thanks & Regards, > > Anil Gupta= > > -- Thanks & Regards, Anil Gupta