RE: Best technique for doing lookup with Secondary Index

Ramkrishna.S.Vasudevan Fri, 26 Oct 2012 01:15:47 -0700

> AFAIK, RPC cannot be avoided even if Region A and Region B are on same
> RS
> since these two regions are from different table. Am i right?


No... suppose your Region A and Region B of different tables are collocated
on same RS then from the coprocessor environment variable you can get access
to the RS.
>From RS you can get the online regions and from that region object you can
call puts or gets.  This will not involve any RPC with in that RS because we
only deal with Region objects.

Regards
Ram

> -----Original Message-----
> From: anil gupta [mailto:anilgupt...@gmail.com]
> Sent: Friday, October 26, 2012 12:17 PM
> To: user@hbase.apache.org
> Subject: Re: Best technique for doing lookup with Secondary Index
> 
> >
> > Now your main question is lookups right
> > Now there are some more hooks in the scan flow called
> pre/postScannerOpen,
> > pre/postScannerNext.
> > May be you can try using them to do a look up on the secondary table
> and
> > then use those values and pass it to the main table next().
> >
> 
> In secondary index its hard to avoid at-least two RPC calls(1 from
> client
> to table B and then from table B to Table A) whether you use coproc or
> not.
> But, i believe using coproc is better than doing RPC calls from client
> since it might be outside the subnet/network of cluster. In this case,
> the
> RPC will be faster when we use coprocs. In my case the client is
> certainly
> not in the same subnet or network zone. I need to provide results of
> query
> in around 100 milliseconds or less so i need to be really frugal. Let
> me
> know your views on this.
> 
> Have you implemented queries with Secondary indexes using coproc yet?
> At present i have tried the client side query and i can get the results
> of
> query in around 100 ms. I am enticed to try out the coproc
> implementation.
> 
> But this may involve more RPC calls as your regions of "A" and "B" may
> be in
> > different RS.
> >
> AFAIK, RPC cannot be avoided even if Region A and Region B are on same
> RS
> since these two regions are from different table. Am i right?
> 
> 
> Thanks,
> Anil Gupta
> 
> On Thu, Oct 25, 2012 at 9:20 PM, Ramkrishna.S.Vasudevan <
> ramkrishna.vasude...@huawei.com> wrote:
> 
> > > Is it a
> > > good idea to create Htable instance on "B" and do put in my mapper?
> I
> > > might
> > > try this idea.
> > Yes you can do this..  May be the same mapper you can do a put for
> table
> > "B".  This was how we have tried loading data to another table by
> using the
> > main table "A"
> > Puts.
> >
> > Now your main question is lookups right
> > Now there are some more hooks in the scan flow called
> pre/postScannerOpen,
> > pre/postScannerNext.
> > May be you can try using them to do a look up on the secondary table
> and
> > then use those values and pass it to the main table next().
> > But this may involve more RPC calls as your regions of "A" and "B"
> may be
> > in
> > different RS.
> >
> > If something is wrong in my understanding of what you said, kindly
> spare
> > me.
> > :)
> >
> > Regards
> > Ram
> >
> >
> > > -----Original Message-----
> > > From: anil gupta [mailto:anilgupt...@gmail.com]
> > > Sent: Friday, October 26, 2012 3:40 AM
> > > To: user@hbase.apache.org
> > > Subject: Re: Best technique for doing lookup with Secondary Index
> > >
> > > Anoop:  In prePut hook u call HTable#put()?
> > > Anil: Yes i call HTable#put() in prePut. Is there better way of
> doing
> > > it?
> > >
> > > Anoop: Why use the network calls from server side here then?
> > > Anil: I thought this is a cleaner approach since i am using
> BulkLoader.
> > > I
> > > decided not to run two jobs since i am generating a
> UniqueIdentifier at
> > > runtime in bulkloader.
> > >
> > > Anoop: can not handle it from client alone?
> > > Anil: I cannot handle it from client since i am using BulkLoader.
> Is it
> > > a
> > > good idea to create Htable instance on "B" and do put in my mapper?
> I
> > > might
> > > try this idea.
> > >
> > > Anoop: You can have a look at Lily project.
> > > Anil: It's little late for us to evaluate Lily now and at present
> we
> > > dont
> > > need complex secondary index since our data is immutable.
> > >
> > > Ram: what is rowkey B here?
> > > Anil: Suppose i am storing customer events in table A. I have two
> > > requirement for data query:
> > > 1. Query customer events on basis of customer_Id and event_ID.
> > > 2. Query customer events on basis of event_timestamp and
> customer_ID.
> > >
> > > 70% of querying is done by query#1, so i will create
> > > <customer_Id><event_ID> as row key of Table A.
> > > Now, in order to support fast results for query#2, i need to create
> a
> > > secondary index on A. I store that secondary index in B, rowkey of
> B is
> > > <event_timestamp><customer_ID>  .Every row stores the corresponding
> > > rowkey
> > > of A.
> > >
> > > Ram:How is the startRow determined for every query?
> > > Anil: Its determined by a very simple application logic.
> > >
> > > Thanks,
> > > Anil Gupta
> > >
> > > On Wed, Oct 24, 2012 at 10:16 PM, Ramkrishna.S.Vasudevan <
> > > ramkrishna.vasude...@huawei.com> wrote:
> > >
> > > > Just out of curiosity,
> > > > > The secondary index is stored in table "B" as rowkey B -->
> > > > > family:<rowkey
> > > > > A>
> > > > what is rowkey B here?
> > > > > 1. Scan the secondary table by using prefix filter and
> startRow.
> > > > How is the startRow determined for every query ?
> > > >
> > > > Regards
> > > > Ram
> > > >
> > > > > -----Original Message-----
> > > > > From: Anoop Sam John [mailto:anoo...@huawei.com]
> > > > > Sent: Thursday, October 25, 2012 10:15 AM
> > > > > To: user@hbase.apache.org
> > > > > Subject: RE: Best technique for doing lookup with Secondary
> Index
> > > > >
> > > > > >I build the secondary table "B" using a prePut RegionObserver.
> > > > >
> > > > > Anil,
> > > > >        In prePut hook u call HTable#put()?  Why use the network
> > > calls
> > > > > from server side here then? can not handle it from client
> alone?
> > > You
> > > > > can have a look at Lily project.   Thoughts after seeing ur
> idea on
> > > put
> > > > > and scan..
> > > > >
> > > > > -Anoop-
> > > > > ________________________________________
> > > > > From: anil gupta [anilgupt...@gmail.com]
> > > > > Sent: Thursday, October 25, 2012 3:10 AM
> > > > > To: user@hbase.apache.org
> > > > > Subject: Best technique for doing lookup with Secondary Index
> > > > >
> > > > > Hi All,
> > > > >
> > > > > I am using HBase 0.92.1. I have created a secondary index on
> table
> > > "A".
> > > > > Table A stores immutable data. I build the secondary table "B"
> > > using a
> > > > > prePut RegionObserver.
> > > > >
> > > > > The secondary index is stored in table "B" as rowkey B -->
> > > > > family:<rowkey
> > > > > A>  . "<rowkey A>" is the column qualifier. Every row in B will
> > > only on
> > > > > have one column and the name of that column is the rowkey of A.
> So
> > > the
> > > > > value is blank. As per my understanding, accessing column
> qualifier
> > > is
> > > > > faster than accessing value. Please correct me if i am wrong.
> > > > >
> > > > >
> > > > > HBase Querying approach:
> > > > > 1. Scan the secondary table by using prefix filter and
> startRow.
> > > > > 2. Do a batch get on primary table by using
> HTable.get(List<Get>)
> > > > > method.
> > > > >
> > > > > The above approach for retrieval works fine but i was wondering
> it
> > > > > there is
> > > > > a better approach. I was planning to try out doing the
> retrieval
> > > using
> > > > > coprocessors.
> > > > > Have anyone tried using coprocessors? I would appreciate if
> others
> > > can
> > > > > share their experience with secondary index for HBase queries.
> > > > >
> > > > > --
> > > > > Thanks & Regards,
> > > > > Anil Gupta=
> > > >
> > > >
> > >
> > >
> > > --
> > > Thanks & Regards,
> > > Anil Gupta
> >
> >
> 
> 
> --
> Thanks & Regards,
> Anil Gupta

RE: Best technique for doing lookup with Secondary Index

Reply via email to