Re: Best technique for doing lookup with Secondary Index

anil gupta Fri, 26 Oct 2012 09:44:23 -0700

Hi Danis,

I downloaded the zip file and copied the source code to my HBase0.92.1
project. It compiled successfully. I am going through the source code right
now. Is it possible for you to provide a architecture diagram for you
implementation?comments in code? It will be easier for users to understand
you implementation quickly.


Thanks,
Anil Gupta


On Fri, Oct 26, 2012 at 8:14 AM, anil gupta <anilgupt...@gmail.com> wrote:

> @fding hbase: thanks for the link. I'll look into it.
>
> Interesting to know that within a region server we dont need a RPC call.
> If we can collocate two regions(or more) then that is the best solution. I
> am not sure how hard it'll be to write a custom load balancer(sounds a bit
> difficult to me). Does anyone knows the classes related to a load balancer?
>
> Thanks,
> Anil
>
>
> On Fri, Oct 26, 2012 at 7:33 AM, Ramkrishna.S.Vasudevan <
> ramkrishna.vasude...@huawei.com> wrote:
>
>> Yes we can do this, but for it to happen you may have to have your custom
>> load balancer which will help you in getting the collocation.
>>
>> Regards
>> Ram
>>
>> > -----Original Message-----
>> > From: Jerry Lam [mailto:chiling...@gmail.com]
>> > Sent: Friday, October 26, 2012 7:59 PM
>> > To: user@hbase.apache.org
>> > Subject: Re: Best technique for doing lookup with Secondary Index
>> >
>> > Can we enforce 2 regions to collocate together as a logical group?
>> >
>> > On Fri, Oct 26, 2012 at 6:14 AM, fding hbase <fding.hb...@gmail.com>
>> > wrote:
>> >
>> > > https://github.com/danix800/hbase-indexed
>> > >
>> > > On Fri, Oct 26, 2012 at 4:13 PM, Ramkrishna.S.Vasudevan <
>> > > ramkrishna.vasude...@huawei.com> wrote:
>> > >
>> > > > > AFAIK, RPC cannot be avoided even if Region A and Region B are on
>> > same
>> > > > > RS
>> > > > > since these two regions are from different table. Am i right?
>> > > >
>> > > > No... suppose your Region A and Region B of different tables are
>> > > collocated
>> > > > on same RS then from the coprocessor environment variable you can
>> > get
>> > > > access
>> > > > to the RS.
>> > > > From RS you can get the online regions and from that region object
>> > you
>> > > can
>> > > > call puts or gets.  This will not involve any RPC with in that RS
>> > because
>> > > > we
>> > > > only deal with Region objects.
>> > > >
>> > > > Regards
>> > > > Ram
>> > > >
>> > > > > -----Original Message-----
>> > > > > From: anil gupta [mailto:anilgupt...@gmail.com]
>> > > > > Sent: Friday, October 26, 2012 12:17 PM
>> > > > > To: user@hbase.apache.org
>> > > > > Subject: Re: Best technique for doing lookup with Secondary Index
>> > > > >
>> > > > > >
>> > > > > > Now your main question is lookups right
>> > > > > > Now there are some more hooks in the scan flow called
>> > > > > pre/postScannerOpen,
>> > > > > > pre/postScannerNext.
>> > > > > > May be you can try using them to do a look up on the secondary
>> > table
>> > > > > and
>> > > > > > then use those values and pass it to the main table next().
>> > > > > >
>> > > > >
>> > > > > In secondary index its hard to avoid at-least two RPC calls(1
>> > from
>> > > > > client
>> > > > > to table B and then from table B to Table A) whether you use
>> > coproc or
>> > > > > not.
>> > > > > But, i believe using coproc is better than doing RPC calls from
>> > client
>> > > > > since it might be outside the subnet/network of cluster. In this
>> > case,
>> > > > > the
>> > > > > RPC will be faster when we use coprocs. In my case the client is
>> > > > > certainly
>> > > > > not in the same subnet or network zone. I need to provide results
>> > of
>> > > > > query
>> > > > > in around 100 milliseconds or less so i need to be really frugal.
>> > Let
>> > > > > me
>> > > > > know your views on this.
>> > > > >
>> > > > > Have you implemented queries with Secondary indexes using coproc
>> > yet?
>> > > > > At present i have tried the client side query and i can get the
>> > results
>> > > > > of
>> > > > > query in around 100 ms. I am enticed to try out the coproc
>> > > > > implementation.
>> > > > >
>> > > > > But this may involve more RPC calls as your regions of "A" and
>> > "B" may
>> > > > > be in
>> > > > > > different RS.
>> > > > > >
>> > > > > AFAIK, RPC cannot be avoided even if Region A and Region B are on
>> > same
>> > > > > RS
>> > > > > since these two regions are from different table. Am i right?
>> > > > >
>> > > > >
>> > > > > Thanks,
>> > > > > Anil Gupta
>> > > > >
>> > > > > On Thu, Oct 25, 2012 at 9:20 PM, Ramkrishna.S.Vasudevan <
>> > > > > ramkrishna.vasude...@huawei.com> wrote:
>> > > > >
>> > > > > > > Is it a
>> > > > > > > good idea to create Htable instance on "B" and do put in my
>> > mapper?
>> > > > > I
>> > > > > > > might
>> > > > > > > try this idea.
>> > > > > > Yes you can do this..  May be the same mapper you can do a put
>> > for
>> > > > > table
>> > > > > > "B".  This was how we have tried loading data to another table
>> > by
>> > > > > using the
>> > > > > > main table "A"
>> > > > > > Puts.
>> > > > > >
>> > > > > > Now your main question is lookups right
>> > > > > > Now there are some more hooks in the scan flow called
>> > > > > pre/postScannerOpen,
>> > > > > > pre/postScannerNext.
>> > > > > > May be you can try using them to do a look up on the secondary
>> > table
>> > > > > and
>> > > > > > then use those values and pass it to the main table next().
>> > > > > > But this may involve more RPC calls as your regions of "A" and
>> > "B"
>> > > > > may be
>> > > > > > in
>> > > > > > different RS.
>> > > > > >
>> > > > > > If something is wrong in my understanding of what you said,
>> > kindly
>> > > > > spare
>> > > > > > me.
>> > > > > > :)
>> > > > > >
>> > > > > > Regards
>> > > > > > Ram
>> > > > > >
>> > > > > >
>> > > > > > > -----Original Message-----
>> > > > > > > From: anil gupta [mailto:anilgupt...@gmail.com]
>> > > > > > > Sent: Friday, October 26, 2012 3:40 AM
>> > > > > > > To: user@hbase.apache.org
>> > > > > > > Subject: Re: Best technique for doing lookup with Secondary
>> > Index
>> > > > > > >
>> > > > > > > Anoop:  In prePut hook u call HTable#put()?
>> > > > > > > Anil: Yes i call HTable#put() in prePut. Is there better way
>> > of
>> > > > > doing
>> > > > > > > it?
>> > > > > > >
>> > > > > > > Anoop: Why use the network calls from server side here then?
>> > > > > > > Anil: I thought this is a cleaner approach since i am using
>> > > > > BulkLoader.
>> > > > > > > I
>> > > > > > > decided not to run two jobs since i am generating a
>> > > > > UniqueIdentifier at
>> > > > > > > runtime in bulkloader.
>> > > > > > >
>> > > > > > > Anoop: can not handle it from client alone?
>> > > > > > > Anil: I cannot handle it from client since i am using
>> > BulkLoader.
>> > > > > Is it
>> > > > > > > a
>> > > > > > > good idea to create Htable instance on "B" and do put in my
>> > mapper?
>> > > > > I
>> > > > > > > might
>> > > > > > > try this idea.
>> > > > > > >
>> > > > > > > Anoop: You can have a look at Lily project.
>> > > > > > > Anil: It's little late for us to evaluate Lily now and at
>> > present
>> > > > > we
>> > > > > > > dont
>> > > > > > > need complex secondary index since our data is immutable.
>> > > > > > >
>> > > > > > > Ram: what is rowkey B here?
>> > > > > > > Anil: Suppose i am storing customer events in table A. I have
>> > two
>> > > > > > > requirement for data query:
>> > > > > > > 1. Query customer events on basis of customer_Id and
>> > event_ID.
>> > > > > > > 2. Query customer events on basis of event_timestamp and
>> > > > > customer_ID.
>> > > > > > >
>> > > > > > > 70% of querying is done by query#1, so i will create
>> > > > > > > <customer_Id><event_ID> as row key of Table A.
>> > > > > > > Now, in order to support fast results for query#2, i need to
>> > create
>> > > > > a
>> > > > > > > secondary index on A. I store that secondary index in B,
>> > rowkey of
>> > > > > B is
>> > > > > > > <event_timestamp><customer_ID>  .Every row stores the
>> > corresponding
>> > > > > > > rowkey
>> > > > > > > of A.
>> > > > > > >
>> > > > > > > Ram:How is the startRow determined for every query?
>> > > > > > > Anil: Its determined by a very simple application logic.
>> > > > > > >
>> > > > > > > Thanks,
>> > > > > > > Anil Gupta
>> > > > > > >
>> > > > > > > On Wed, Oct 24, 2012 at 10:16 PM, Ramkrishna.S.Vasudevan <
>> > > > > > > ramkrishna.vasude...@huawei.com> wrote:
>> > > > > > >
>> > > > > > > > Just out of curiosity,
>> > > > > > > > > The secondary index is stored in table "B" as rowkey B --
>> > >
>> > > > > > > > > family:<rowkey
>> > > > > > > > > A>
>> > > > > > > > what is rowkey B here?
>> > > > > > > > > 1. Scan the secondary table by using prefix filter and
>> > > > > startRow.
>> > > > > > > > How is the startRow determined for every query ?
>> > > > > > > >
>> > > > > > > > Regards
>> > > > > > > > Ram
>> > > > > > > >
>> > > > > > > > > -----Original Message-----
>> > > > > > > > > From: Anoop Sam John [mailto:anoo...@huawei.com]
>> > > > > > > > > Sent: Thursday, October 25, 2012 10:15 AM
>> > > > > > > > > To: user@hbase.apache.org
>> > > > > > > > > Subject: RE: Best technique for doing lookup with
>> > Secondary
>> > > > > Index
>> > > > > > > > >
>> > > > > > > > > >I build the secondary table "B" using a prePut
>> > RegionObserver.
>> > > > > > > > >
>> > > > > > > > > Anil,
>> > > > > > > > >        In prePut hook u call HTable#put()?  Why use the
>> > network
>> > > > > > > calls
>> > > > > > > > > from server side here then? can not handle it from client
>> > > > > alone?
>> > > > > > > You
>> > > > > > > > > can have a look at Lily project.   Thoughts after seeing
>> > ur
>> > > > > idea on
>> > > > > > > put
>> > > > > > > > > and scan..
>> > > > > > > > >
>> > > > > > > > > -Anoop-
>> > > > > > > > > ________________________________________
>> > > > > > > > > From: anil gupta [anilgupt...@gmail.com]
>> > > > > > > > > Sent: Thursday, October 25, 2012 3:10 AM
>> > > > > > > > > To: user@hbase.apache.org
>> > > > > > > > > Subject: Best technique for doing lookup with Secondary
>> > Index
>> > > > > > > > >
>> > > > > > > > > Hi All,
>> > > > > > > > >
>> > > > > > > > > I am using HBase 0.92.1. I have created a secondary index
>> > on
>> > > > > table
>> > > > > > > "A".
>> > > > > > > > > Table A stores immutable data. I build the secondary
>> > table "B"
>> > > > > > > using a
>> > > > > > > > > prePut RegionObserver.
>> > > > > > > > >
>> > > > > > > > > The secondary index is stored in table "B" as rowkey B --
>> > >
>> > > > > > > > > family:<rowkey
>> > > > > > > > > A>  . "<rowkey A>" is the column qualifier. Every row in
>> > B will
>> > > > > > > only on
>> > > > > > > > > have one column and the name of that column is the rowkey
>> > of A.
>> > > > > So
>> > > > > > > the
>> > > > > > > > > value is blank. As per my understanding, accessing column
>> > > > > qualifier
>> > > > > > > is
>> > > > > > > > > faster than accessing value. Please correct me if i am
>> > wrong.
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > HBase Querying approach:
>> > > > > > > > > 1. Scan the secondary table by using prefix filter and
>> > > > > startRow.
>> > > > > > > > > 2. Do a batch get on primary table by using
>> > > > > HTable.get(List<Get>)
>> > > > > > > > > method.
>> > > > > > > > >
>> > > > > > > > > The above approach for retrieval works fine but i was
>> > wondering
>> > > > > it
>> > > > > > > > > there is
>> > > > > > > > > a better approach. I was planning to try out doing the
>> > > > > retrieval
>> > > > > > > using
>> > > > > > > > > coprocessors.
>> > > > > > > > > Have anyone tried using coprocessors? I would appreciate
>> > if
>> > > > > others
>> > > > > > > can
>> > > > > > > > > share their experience with secondary index for HBase
>> > queries.
>> > > > > > > > >
>> > > > > > > > > --
>> > > > > > > > > Thanks & Regards,
>> > > > > > > > > Anil Gupta=
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > --
>> > > > > > > Thanks & Regards,
>> > > > > > > Anil Gupta
>> > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Thanks & Regards,
>> > > > > Anil Gupta
>> > > >
>> > > >
>> > >
>> > >
>> > > --
>> > >
>> > > Best Regards!
>> > >
>> > > Fei Ding
>> > > fding.chu...@gmail.com
>> > >
>>
>>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>



-- 
Thanks & Regards,
Anil Gupta

Re: Best technique for doing lookup with Secondary Index

Reply via email to