@fding hbase: thanks for the link. I'll look into it. Interesting to know that within a region server we dont need a RPC call. If we can collocate two regions(or more) then that is the best solution. I am not sure how hard it'll be to write a custom load balancer(sounds a bit difficult to me). Does anyone knows the classes related to a load balancer?
Thanks, Anil On Fri, Oct 26, 2012 at 7:33 AM, Ramkrishna.S.Vasudevan < ramkrishna.vasude...@huawei.com> wrote: > Yes we can do this, but for it to happen you may have to have your custom > load balancer which will help you in getting the collocation. > > Regards > Ram > > > -----Original Message----- > > From: Jerry Lam [mailto:chiling...@gmail.com] > > Sent: Friday, October 26, 2012 7:59 PM > > To: user@hbase.apache.org > > Subject: Re: Best technique for doing lookup with Secondary Index > > > > Can we enforce 2 regions to collocate together as a logical group? > > > > On Fri, Oct 26, 2012 at 6:14 AM, fding hbase <fding.hb...@gmail.com> > > wrote: > > > > > https://github.com/danix800/hbase-indexed > > > > > > On Fri, Oct 26, 2012 at 4:13 PM, Ramkrishna.S.Vasudevan < > > > ramkrishna.vasude...@huawei.com> wrote: > > > > > > > > AFAIK, RPC cannot be avoided even if Region A and Region B are on > > same > > > > > RS > > > > > since these two regions are from different table. Am i right? > > > > > > > > No... suppose your Region A and Region B of different tables are > > > collocated > > > > on same RS then from the coprocessor environment variable you can > > get > > > > access > > > > to the RS. > > > > From RS you can get the online regions and from that region object > > you > > > can > > > > call puts or gets. This will not involve any RPC with in that RS > > because > > > > we > > > > only deal with Region objects. > > > > > > > > Regards > > > > Ram > > > > > > > > > -----Original Message----- > > > > > From: anil gupta [mailto:anilgupt...@gmail.com] > > > > > Sent: Friday, October 26, 2012 12:17 PM > > > > > To: user@hbase.apache.org > > > > > Subject: Re: Best technique for doing lookup with Secondary Index > > > > > > > > > > > > > > > > > Now your main question is lookups right > > > > > > Now there are some more hooks in the scan flow called > > > > > pre/postScannerOpen, > > > > > > pre/postScannerNext. > > > > > > May be you can try using them to do a look up on the secondary > > table > > > > > and > > > > > > then use those values and pass it to the main table next(). > > > > > > > > > > > > > > > > In secondary index its hard to avoid at-least two RPC calls(1 > > from > > > > > client > > > > > to table B and then from table B to Table A) whether you use > > coproc or > > > > > not. > > > > > But, i believe using coproc is better than doing RPC calls from > > client > > > > > since it might be outside the subnet/network of cluster. In this > > case, > > > > > the > > > > > RPC will be faster when we use coprocs. In my case the client is > > > > > certainly > > > > > not in the same subnet or network zone. I need to provide results > > of > > > > > query > > > > > in around 100 milliseconds or less so i need to be really frugal. > > Let > > > > > me > > > > > know your views on this. > > > > > > > > > > Have you implemented queries with Secondary indexes using coproc > > yet? > > > > > At present i have tried the client side query and i can get the > > results > > > > > of > > > > > query in around 100 ms. I am enticed to try out the coproc > > > > > implementation. > > > > > > > > > > But this may involve more RPC calls as your regions of "A" and > > "B" may > > > > > be in > > > > > > different RS. > > > > > > > > > > > AFAIK, RPC cannot be avoided even if Region A and Region B are on > > same > > > > > RS > > > > > since these two regions are from different table. Am i right? > > > > > > > > > > > > > > > Thanks, > > > > > Anil Gupta > > > > > > > > > > On Thu, Oct 25, 2012 at 9:20 PM, Ramkrishna.S.Vasudevan < > > > > > ramkrishna.vasude...@huawei.com> wrote: > > > > > > > > > > > > Is it a > > > > > > > good idea to create Htable instance on "B" and do put in my > > mapper? > > > > > I > > > > > > > might > > > > > > > try this idea. > > > > > > Yes you can do this.. May be the same mapper you can do a put > > for > > > > > table > > > > > > "B". This was how we have tried loading data to another table > > by > > > > > using the > > > > > > main table "A" > > > > > > Puts. > > > > > > > > > > > > Now your main question is lookups right > > > > > > Now there are some more hooks in the scan flow called > > > > > pre/postScannerOpen, > > > > > > pre/postScannerNext. > > > > > > May be you can try using them to do a look up on the secondary > > table > > > > > and > > > > > > then use those values and pass it to the main table next(). > > > > > > But this may involve more RPC calls as your regions of "A" and > > "B" > > > > > may be > > > > > > in > > > > > > different RS. > > > > > > > > > > > > If something is wrong in my understanding of what you said, > > kindly > > > > > spare > > > > > > me. > > > > > > :) > > > > > > > > > > > > Regards > > > > > > Ram > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > From: anil gupta [mailto:anilgupt...@gmail.com] > > > > > > > Sent: Friday, October 26, 2012 3:40 AM > > > > > > > To: user@hbase.apache.org > > > > > > > Subject: Re: Best technique for doing lookup with Secondary > > Index > > > > > > > > > > > > > > Anoop: In prePut hook u call HTable#put()? > > > > > > > Anil: Yes i call HTable#put() in prePut. Is there better way > > of > > > > > doing > > > > > > > it? > > > > > > > > > > > > > > Anoop: Why use the network calls from server side here then? > > > > > > > Anil: I thought this is a cleaner approach since i am using > > > > > BulkLoader. > > > > > > > I > > > > > > > decided not to run two jobs since i am generating a > > > > > UniqueIdentifier at > > > > > > > runtime in bulkloader. > > > > > > > > > > > > > > Anoop: can not handle it from client alone? > > > > > > > Anil: I cannot handle it from client since i am using > > BulkLoader. > > > > > Is it > > > > > > > a > > > > > > > good idea to create Htable instance on "B" and do put in my > > mapper? > > > > > I > > > > > > > might > > > > > > > try this idea. > > > > > > > > > > > > > > Anoop: You can have a look at Lily project. > > > > > > > Anil: It's little late for us to evaluate Lily now and at > > present > > > > > we > > > > > > > dont > > > > > > > need complex secondary index since our data is immutable. > > > > > > > > > > > > > > Ram: what is rowkey B here? > > > > > > > Anil: Suppose i am storing customer events in table A. I have > > two > > > > > > > requirement for data query: > > > > > > > 1. Query customer events on basis of customer_Id and > > event_ID. > > > > > > > 2. Query customer events on basis of event_timestamp and > > > > > customer_ID. > > > > > > > > > > > > > > 70% of querying is done by query#1, so i will create > > > > > > > <customer_Id><event_ID> as row key of Table A. > > > > > > > Now, in order to support fast results for query#2, i need to > > create > > > > > a > > > > > > > secondary index on A. I store that secondary index in B, > > rowkey of > > > > > B is > > > > > > > <event_timestamp><customer_ID> .Every row stores the > > corresponding > > > > > > > rowkey > > > > > > > of A. > > > > > > > > > > > > > > Ram:How is the startRow determined for every query? > > > > > > > Anil: Its determined by a very simple application logic. > > > > > > > > > > > > > > Thanks, > > > > > > > Anil Gupta > > > > > > > > > > > > > > On Wed, Oct 24, 2012 at 10:16 PM, Ramkrishna.S.Vasudevan < > > > > > > > ramkrishna.vasude...@huawei.com> wrote: > > > > > > > > > > > > > > > Just out of curiosity, > > > > > > > > > The secondary index is stored in table "B" as rowkey B -- > > > > > > > > > > > > family:<rowkey > > > > > > > > > A> > > > > > > > > what is rowkey B here? > > > > > > > > > 1. Scan the secondary table by using prefix filter and > > > > > startRow. > > > > > > > > How is the startRow determined for every query ? > > > > > > > > > > > > > > > > Regards > > > > > > > > Ram > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > > From: Anoop Sam John [mailto:anoo...@huawei.com] > > > > > > > > > Sent: Thursday, October 25, 2012 10:15 AM > > > > > > > > > To: user@hbase.apache.org > > > > > > > > > Subject: RE: Best technique for doing lookup with > > Secondary > > > > > Index > > > > > > > > > > > > > > > > > > >I build the secondary table "B" using a prePut > > RegionObserver. > > > > > > > > > > > > > > > > > > Anil, > > > > > > > > > In prePut hook u call HTable#put()? Why use the > > network > > > > > > > calls > > > > > > > > > from server side here then? can not handle it from client > > > > > alone? > > > > > > > You > > > > > > > > > can have a look at Lily project. Thoughts after seeing > > ur > > > > > idea on > > > > > > > put > > > > > > > > > and scan.. > > > > > > > > > > > > > > > > > > -Anoop- > > > > > > > > > ________________________________________ > > > > > > > > > From: anil gupta [anilgupt...@gmail.com] > > > > > > > > > Sent: Thursday, October 25, 2012 3:10 AM > > > > > > > > > To: user@hbase.apache.org > > > > > > > > > Subject: Best technique for doing lookup with Secondary > > Index > > > > > > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > > > > > I am using HBase 0.92.1. I have created a secondary index > > on > > > > > table > > > > > > > "A". > > > > > > > > > Table A stores immutable data. I build the secondary > > table "B" > > > > > > > using a > > > > > > > > > prePut RegionObserver. > > > > > > > > > > > > > > > > > > The secondary index is stored in table "B" as rowkey B -- > > > > > > > > > > > > family:<rowkey > > > > > > > > > A> . "<rowkey A>" is the column qualifier. Every row in > > B will > > > > > > > only on > > > > > > > > > have one column and the name of that column is the rowkey > > of A. > > > > > So > > > > > > > the > > > > > > > > > value is blank. As per my understanding, accessing column > > > > > qualifier > > > > > > > is > > > > > > > > > faster than accessing value. Please correct me if i am > > wrong. > > > > > > > > > > > > > > > > > > > > > > > > > > > HBase Querying approach: > > > > > > > > > 1. Scan the secondary table by using prefix filter and > > > > > startRow. > > > > > > > > > 2. Do a batch get on primary table by using > > > > > HTable.get(List<Get>) > > > > > > > > > method. > > > > > > > > > > > > > > > > > > The above approach for retrieval works fine but i was > > wondering > > > > > it > > > > > > > > > there is > > > > > > > > > a better approach. I was planning to try out doing the > > > > > retrieval > > > > > > > using > > > > > > > > > coprocessors. > > > > > > > > > Have anyone tried using coprocessors? I would appreciate > > if > > > > > others > > > > > > > can > > > > > > > > > share their experience with secondary index for HBase > > queries. > > > > > > > > > > > > > > > > > > -- > > > > > > > > > Thanks & Regards, > > > > > > > > > Anil Gupta= > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Thanks & Regards, > > > > > > > Anil Gupta > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Thanks & Regards, > > > > > Anil Gupta > > > > > > > > > > > > > > > > > -- > > > > > > Best Regards! > > > > > > Fei Ding > > > fding.chu...@gmail.com > > > > > -- Thanks & Regards, Anil Gupta