Re: Best technique for doing lookup with Secondary Index

anil gupta Fri, 26 Oct 2012 08:15:10 -0700

@fding hbase: thanks for the link. I'll look into it.

Interesting to know that within a region server we dont need a RPC call. If
we can collocate two regions(or more) then that is the best solution. I am
not sure how hard it'll be to write a custom load balancer(sounds a bit
difficult to me). Does anyone knows the classes related to a load balancer?


Thanks,
Anil

On Fri, Oct 26, 2012 at 7:33 AM, Ramkrishna.S.Vasudevan <
ramkrishna.vasude...@huawei.com> wrote:

> Yes we can do this, but for it to happen you may have to have your custom
> load balancer which will help you in getting the collocation.
>
> Regards
> Ram
>
> > -----Original Message-----
> > From: Jerry Lam [mailto:chiling...@gmail.com]
> > Sent: Friday, October 26, 2012 7:59 PM
> > To: user@hbase.apache.org
> > Subject: Re: Best technique for doing lookup with Secondary Index
> >
> > Can we enforce 2 regions to collocate together as a logical group?
> >
> > On Fri, Oct 26, 2012 at 6:14 AM, fding hbase <fding.hb...@gmail.com>
> > wrote:
> >
> > > https://github.com/danix800/hbase-indexed
> > >
> > > On Fri, Oct 26, 2012 at 4:13 PM, Ramkrishna.S.Vasudevan <
> > > ramkrishna.vasude...@huawei.com> wrote:
> > >
> > > > > AFAIK, RPC cannot be avoided even if Region A and Region B are on
> > same
> > > > > RS
> > > > > since these two regions are from different table. Am i right?
> > > >
> > > > No... suppose your Region A and Region B of different tables are
> > > collocated
> > > > on same RS then from the coprocessor environment variable you can
> > get
> > > > access
> > > > to the RS.
> > > > From RS you can get the online regions and from that region object
> > you
> > > can
> > > > call puts or gets.  This will not involve any RPC with in that RS
> > because
> > > > we
> > > > only deal with Region objects.
> > > >
> > > > Regards
> > > > Ram
> > > >
> > > > > -----Original Message-----
> > > > > From: anil gupta [mailto:anilgupt...@gmail.com]
> > > > > Sent: Friday, October 26, 2012 12:17 PM
> > > > > To: user@hbase.apache.org
> > > > > Subject: Re: Best technique for doing lookup with Secondary Index
> > > > >
> > > > > >
> > > > > > Now your main question is lookups right
> > > > > > Now there are some more hooks in the scan flow called
> > > > > pre/postScannerOpen,
> > > > > > pre/postScannerNext.
> > > > > > May be you can try using them to do a look up on the secondary
> > table
> > > > > and
> > > > > > then use those values and pass it to the main table next().
> > > > > >
> > > > >
> > > > > In secondary index its hard to avoid at-least two RPC calls(1
> > from
> > > > > client
> > > > > to table B and then from table B to Table A) whether you use
> > coproc or
> > > > > not.
> > > > > But, i believe using coproc is better than doing RPC calls from
> > client
> > > > > since it might be outside the subnet/network of cluster. In this
> > case,
> > > > > the
> > > > > RPC will be faster when we use coprocs. In my case the client is
> > > > > certainly
> > > > > not in the same subnet or network zone. I need to provide results
> > of
> > > > > query
> > > > > in around 100 milliseconds or less so i need to be really frugal.
> > Let
> > > > > me
> > > > > know your views on this.
> > > > >
> > > > > Have you implemented queries with Secondary indexes using coproc
> > yet?
> > > > > At present i have tried the client side query and i can get the
> > results
> > > > > of
> > > > > query in around 100 ms. I am enticed to try out the coproc
> > > > > implementation.
> > > > >
> > > > > But this may involve more RPC calls as your regions of "A" and
> > "B" may
> > > > > be in
> > > > > > different RS.
> > > > > >
> > > > > AFAIK, RPC cannot be avoided even if Region A and Region B are on
> > same
> > > > > RS
> > > > > since these two regions are from different table. Am i right?
> > > > >
> > > > >
> > > > > Thanks,
> > > > > Anil Gupta
> > > > >
> > > > > On Thu, Oct 25, 2012 at 9:20 PM, Ramkrishna.S.Vasudevan <
> > > > > ramkrishna.vasude...@huawei.com> wrote:
> > > > >
> > > > > > > Is it a
> > > > > > > good idea to create Htable instance on "B" and do put in my
> > mapper?
> > > > > I
> > > > > > > might
> > > > > > > try this idea.
> > > > > > Yes you can do this..  May be the same mapper you can do a put
> > for
> > > > > table
> > > > > > "B".  This was how we have tried loading data to another table
> > by
> > > > > using the
> > > > > > main table "A"
> > > > > > Puts.
> > > > > >
> > > > > > Now your main question is lookups right
> > > > > > Now there are some more hooks in the scan flow called
> > > > > pre/postScannerOpen,
> > > > > > pre/postScannerNext.
> > > > > > May be you can try using them to do a look up on the secondary
> > table
> > > > > and
> > > > > > then use those values and pass it to the main table next().
> > > > > > But this may involve more RPC calls as your regions of "A" and
> > "B"
> > > > > may be
> > > > > > in
> > > > > > different RS.
> > > > > >
> > > > > > If something is wrong in my understanding of what you said,
> > kindly
> > > > > spare
> > > > > > me.
> > > > > > :)
> > > > > >
> > > > > > Regards
> > > > > > Ram
> > > > > >
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: anil gupta [mailto:anilgupt...@gmail.com]
> > > > > > > Sent: Friday, October 26, 2012 3:40 AM
> > > > > > > To: user@hbase.apache.org
> > > > > > > Subject: Re: Best technique for doing lookup with Secondary
> > Index
> > > > > > >
> > > > > > > Anoop:  In prePut hook u call HTable#put()?
> > > > > > > Anil: Yes i call HTable#put() in prePut. Is there better way
> > of
> > > > > doing
> > > > > > > it?
> > > > > > >
> > > > > > > Anoop: Why use the network calls from server side here then?
> > > > > > > Anil: I thought this is a cleaner approach since i am using
> > > > > BulkLoader.
> > > > > > > I
> > > > > > > decided not to run two jobs since i am generating a
> > > > > UniqueIdentifier at
> > > > > > > runtime in bulkloader.
> > > > > > >
> > > > > > > Anoop: can not handle it from client alone?
> > > > > > > Anil: I cannot handle it from client since i am using
> > BulkLoader.
> > > > > Is it
> > > > > > > a
> > > > > > > good idea to create Htable instance on "B" and do put in my
> > mapper?
> > > > > I
> > > > > > > might
> > > > > > > try this idea.
> > > > > > >
> > > > > > > Anoop: You can have a look at Lily project.
> > > > > > > Anil: It's little late for us to evaluate Lily now and at
> > present
> > > > > we
> > > > > > > dont
> > > > > > > need complex secondary index since our data is immutable.
> > > > > > >
> > > > > > > Ram: what is rowkey B here?
> > > > > > > Anil: Suppose i am storing customer events in table A. I have
> > two
> > > > > > > requirement for data query:
> > > > > > > 1. Query customer events on basis of customer_Id and
> > event_ID.
> > > > > > > 2. Query customer events on basis of event_timestamp and
> > > > > customer_ID.
> > > > > > >
> > > > > > > 70% of querying is done by query#1, so i will create
> > > > > > > <customer_Id><event_ID> as row key of Table A.
> > > > > > > Now, in order to support fast results for query#2, i need to
> > create
> > > > > a
> > > > > > > secondary index on A. I store that secondary index in B,
> > rowkey of
> > > > > B is
> > > > > > > <event_timestamp><customer_ID>  .Every row stores the
> > corresponding
> > > > > > > rowkey
> > > > > > > of A.
> > > > > > >
> > > > > > > Ram:How is the startRow determined for every query?
> > > > > > > Anil: Its determined by a very simple application logic.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Anil Gupta
> > > > > > >
> > > > > > > On Wed, Oct 24, 2012 at 10:16 PM, Ramkrishna.S.Vasudevan <
> > > > > > > ramkrishna.vasude...@huawei.com> wrote:
> > > > > > >
> > > > > > > > Just out of curiosity,
> > > > > > > > > The secondary index is stored in table "B" as rowkey B --
> > >
> > > > > > > > > family:<rowkey
> > > > > > > > > A>
> > > > > > > > what is rowkey B here?
> > > > > > > > > 1. Scan the secondary table by using prefix filter and
> > > > > startRow.
> > > > > > > > How is the startRow determined for every query ?
> > > > > > > >
> > > > > > > > Regards
> > > > > > > > Ram
> > > > > > > >
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: Anoop Sam John [mailto:anoo...@huawei.com]
> > > > > > > > > Sent: Thursday, October 25, 2012 10:15 AM
> > > > > > > > > To: user@hbase.apache.org
> > > > > > > > > Subject: RE: Best technique for doing lookup with
> > Secondary
> > > > > Index
> > > > > > > > >
> > > > > > > > > >I build the secondary table "B" using a prePut
> > RegionObserver.
> > > > > > > > >
> > > > > > > > > Anil,
> > > > > > > > >        In prePut hook u call HTable#put()?  Why use the
> > network
> > > > > > > calls
> > > > > > > > > from server side here then? can not handle it from client
> > > > > alone?
> > > > > > > You
> > > > > > > > > can have a look at Lily project.   Thoughts after seeing
> > ur
> > > > > idea on
> > > > > > > put
> > > > > > > > > and scan..
> > > > > > > > >
> > > > > > > > > -Anoop-
> > > > > > > > > ________________________________________
> > > > > > > > > From: anil gupta [anilgupt...@gmail.com]
> > > > > > > > > Sent: Thursday, October 25, 2012 3:10 AM
> > > > > > > > > To: user@hbase.apache.org
> > > > > > > > > Subject: Best technique for doing lookup with Secondary
> > Index
> > > > > > > > >
> > > > > > > > > Hi All,
> > > > > > > > >
> > > > > > > > > I am using HBase 0.92.1. I have created a secondary index
> > on
> > > > > table
> > > > > > > "A".
> > > > > > > > > Table A stores immutable data. I build the secondary
> > table "B"
> > > > > > > using a
> > > > > > > > > prePut RegionObserver.
> > > > > > > > >
> > > > > > > > > The secondary index is stored in table "B" as rowkey B --
> > >
> > > > > > > > > family:<rowkey
> > > > > > > > > A>  . "<rowkey A>" is the column qualifier. Every row in
> > B will
> > > > > > > only on
> > > > > > > > > have one column and the name of that column is the rowkey
> > of A.
> > > > > So
> > > > > > > the
> > > > > > > > > value is blank. As per my understanding, accessing column
> > > > > qualifier
> > > > > > > is
> > > > > > > > > faster than accessing value. Please correct me if i am
> > wrong.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > HBase Querying approach:
> > > > > > > > > 1. Scan the secondary table by using prefix filter and
> > > > > startRow.
> > > > > > > > > 2. Do a batch get on primary table by using
> > > > > HTable.get(List<Get>)
> > > > > > > > > method.
> > > > > > > > >
> > > > > > > > > The above approach for retrieval works fine but i was
> > wondering
> > > > > it
> > > > > > > > > there is
> > > > > > > > > a better approach. I was planning to try out doing the
> > > > > retrieval
> > > > > > > using
> > > > > > > > > coprocessors.
> > > > > > > > > Have anyone tried using coprocessors? I would appreciate
> > if
> > > > > others
> > > > > > > can
> > > > > > > > > share their experience with secondary index for HBase
> > queries.
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Thanks & Regards,
> > > > > > > > > Anil Gupta=
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Thanks & Regards,
> > > > > > > Anil Gupta
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Thanks & Regards,
> > > > > Anil Gupta
> > > >
> > > >
> > >
> > >
> > > --
> > >
> > > Best Regards!
> > >
> > > Fei Ding
> > > fding.chu...@gmail.com
> > >
>
>


-- 
Thanks & Regards,
Anil Gupta

Re: Best technique for doing lookup with Secondary Index

Reply via email to