Hi Danis, I downloaded the zip file and copied the source code to my HBase0.92.1 project. It compiled successfully. I am going through the source code right now. Is it possible for you to provide a architecture diagram for you implementation?comments in code? It will be easier for users to understand you implementation quickly.
Thanks, Anil Gupta On Fri, Oct 26, 2012 at 8:14 AM, anil gupta <anilgupt...@gmail.com> wrote: > @fding hbase: thanks for the link. I'll look into it. > > Interesting to know that within a region server we dont need a RPC call. > If we can collocate two regions(or more) then that is the best solution. I > am not sure how hard it'll be to write a custom load balancer(sounds a bit > difficult to me). Does anyone knows the classes related to a load balancer? > > Thanks, > Anil > > > On Fri, Oct 26, 2012 at 7:33 AM, Ramkrishna.S.Vasudevan < > ramkrishna.vasude...@huawei.com> wrote: > >> Yes we can do this, but for it to happen you may have to have your custom >> load balancer which will help you in getting the collocation. >> >> Regards >> Ram >> >> > -----Original Message----- >> > From: Jerry Lam [mailto:chiling...@gmail.com] >> > Sent: Friday, October 26, 2012 7:59 PM >> > To: user@hbase.apache.org >> > Subject: Re: Best technique for doing lookup with Secondary Index >> > >> > Can we enforce 2 regions to collocate together as a logical group? >> > >> > On Fri, Oct 26, 2012 at 6:14 AM, fding hbase <fding.hb...@gmail.com> >> > wrote: >> > >> > > https://github.com/danix800/hbase-indexed >> > > >> > > On Fri, Oct 26, 2012 at 4:13 PM, Ramkrishna.S.Vasudevan < >> > > ramkrishna.vasude...@huawei.com> wrote: >> > > >> > > > > AFAIK, RPC cannot be avoided even if Region A and Region B are on >> > same >> > > > > RS >> > > > > since these two regions are from different table. Am i right? >> > > > >> > > > No... suppose your Region A and Region B of different tables are >> > > collocated >> > > > on same RS then from the coprocessor environment variable you can >> > get >> > > > access >> > > > to the RS. >> > > > From RS you can get the online regions and from that region object >> > you >> > > can >> > > > call puts or gets. This will not involve any RPC with in that RS >> > because >> > > > we >> > > > only deal with Region objects. >> > > > >> > > > Regards >> > > > Ram >> > > > >> > > > > -----Original Message----- >> > > > > From: anil gupta [mailto:anilgupt...@gmail.com] >> > > > > Sent: Friday, October 26, 2012 12:17 PM >> > > > > To: user@hbase.apache.org >> > > > > Subject: Re: Best technique for doing lookup with Secondary Index >> > > > > >> > > > > > >> > > > > > Now your main question is lookups right >> > > > > > Now there are some more hooks in the scan flow called >> > > > > pre/postScannerOpen, >> > > > > > pre/postScannerNext. >> > > > > > May be you can try using them to do a look up on the secondary >> > table >> > > > > and >> > > > > > then use those values and pass it to the main table next(). >> > > > > > >> > > > > >> > > > > In secondary index its hard to avoid at-least two RPC calls(1 >> > from >> > > > > client >> > > > > to table B and then from table B to Table A) whether you use >> > coproc or >> > > > > not. >> > > > > But, i believe using coproc is better than doing RPC calls from >> > client >> > > > > since it might be outside the subnet/network of cluster. In this >> > case, >> > > > > the >> > > > > RPC will be faster when we use coprocs. In my case the client is >> > > > > certainly >> > > > > not in the same subnet or network zone. I need to provide results >> > of >> > > > > query >> > > > > in around 100 milliseconds or less so i need to be really frugal. >> > Let >> > > > > me >> > > > > know your views on this. >> > > > > >> > > > > Have you implemented queries with Secondary indexes using coproc >> > yet? >> > > > > At present i have tried the client side query and i can get the >> > results >> > > > > of >> > > > > query in around 100 ms. I am enticed to try out the coproc >> > > > > implementation. >> > > > > >> > > > > But this may involve more RPC calls as your regions of "A" and >> > "B" may >> > > > > be in >> > > > > > different RS. >> > > > > > >> > > > > AFAIK, RPC cannot be avoided even if Region A and Region B are on >> > same >> > > > > RS >> > > > > since these two regions are from different table. Am i right? >> > > > > >> > > > > >> > > > > Thanks, >> > > > > Anil Gupta >> > > > > >> > > > > On Thu, Oct 25, 2012 at 9:20 PM, Ramkrishna.S.Vasudevan < >> > > > > ramkrishna.vasude...@huawei.com> wrote: >> > > > > >> > > > > > > Is it a >> > > > > > > good idea to create Htable instance on "B" and do put in my >> > mapper? >> > > > > I >> > > > > > > might >> > > > > > > try this idea. >> > > > > > Yes you can do this.. May be the same mapper you can do a put >> > for >> > > > > table >> > > > > > "B". This was how we have tried loading data to another table >> > by >> > > > > using the >> > > > > > main table "A" >> > > > > > Puts. >> > > > > > >> > > > > > Now your main question is lookups right >> > > > > > Now there are some more hooks in the scan flow called >> > > > > pre/postScannerOpen, >> > > > > > pre/postScannerNext. >> > > > > > May be you can try using them to do a look up on the secondary >> > table >> > > > > and >> > > > > > then use those values and pass it to the main table next(). >> > > > > > But this may involve more RPC calls as your regions of "A" and >> > "B" >> > > > > may be >> > > > > > in >> > > > > > different RS. >> > > > > > >> > > > > > If something is wrong in my understanding of what you said, >> > kindly >> > > > > spare >> > > > > > me. >> > > > > > :) >> > > > > > >> > > > > > Regards >> > > > > > Ram >> > > > > > >> > > > > > >> > > > > > > -----Original Message----- >> > > > > > > From: anil gupta [mailto:anilgupt...@gmail.com] >> > > > > > > Sent: Friday, October 26, 2012 3:40 AM >> > > > > > > To: user@hbase.apache.org >> > > > > > > Subject: Re: Best technique for doing lookup with Secondary >> > Index >> > > > > > > >> > > > > > > Anoop: In prePut hook u call HTable#put()? >> > > > > > > Anil: Yes i call HTable#put() in prePut. Is there better way >> > of >> > > > > doing >> > > > > > > it? >> > > > > > > >> > > > > > > Anoop: Why use the network calls from server side here then? >> > > > > > > Anil: I thought this is a cleaner approach since i am using >> > > > > BulkLoader. >> > > > > > > I >> > > > > > > decided not to run two jobs since i am generating a >> > > > > UniqueIdentifier at >> > > > > > > runtime in bulkloader. >> > > > > > > >> > > > > > > Anoop: can not handle it from client alone? >> > > > > > > Anil: I cannot handle it from client since i am using >> > BulkLoader. >> > > > > Is it >> > > > > > > a >> > > > > > > good idea to create Htable instance on "B" and do put in my >> > mapper? >> > > > > I >> > > > > > > might >> > > > > > > try this idea. >> > > > > > > >> > > > > > > Anoop: You can have a look at Lily project. >> > > > > > > Anil: It's little late for us to evaluate Lily now and at >> > present >> > > > > we >> > > > > > > dont >> > > > > > > need complex secondary index since our data is immutable. >> > > > > > > >> > > > > > > Ram: what is rowkey B here? >> > > > > > > Anil: Suppose i am storing customer events in table A. I have >> > two >> > > > > > > requirement for data query: >> > > > > > > 1. Query customer events on basis of customer_Id and >> > event_ID. >> > > > > > > 2. Query customer events on basis of event_timestamp and >> > > > > customer_ID. >> > > > > > > >> > > > > > > 70% of querying is done by query#1, so i will create >> > > > > > > <customer_Id><event_ID> as row key of Table A. >> > > > > > > Now, in order to support fast results for query#2, i need to >> > create >> > > > > a >> > > > > > > secondary index on A. I store that secondary index in B, >> > rowkey of >> > > > > B is >> > > > > > > <event_timestamp><customer_ID> .Every row stores the >> > corresponding >> > > > > > > rowkey >> > > > > > > of A. >> > > > > > > >> > > > > > > Ram:How is the startRow determined for every query? >> > > > > > > Anil: Its determined by a very simple application logic. >> > > > > > > >> > > > > > > Thanks, >> > > > > > > Anil Gupta >> > > > > > > >> > > > > > > On Wed, Oct 24, 2012 at 10:16 PM, Ramkrishna.S.Vasudevan < >> > > > > > > ramkrishna.vasude...@huawei.com> wrote: >> > > > > > > >> > > > > > > > Just out of curiosity, >> > > > > > > > > The secondary index is stored in table "B" as rowkey B -- >> > > >> > > > > > > > > family:<rowkey >> > > > > > > > > A> >> > > > > > > > what is rowkey B here? >> > > > > > > > > 1. Scan the secondary table by using prefix filter and >> > > > > startRow. >> > > > > > > > How is the startRow determined for every query ? >> > > > > > > > >> > > > > > > > Regards >> > > > > > > > Ram >> > > > > > > > >> > > > > > > > > -----Original Message----- >> > > > > > > > > From: Anoop Sam John [mailto:anoo...@huawei.com] >> > > > > > > > > Sent: Thursday, October 25, 2012 10:15 AM >> > > > > > > > > To: user@hbase.apache.org >> > > > > > > > > Subject: RE: Best technique for doing lookup with >> > Secondary >> > > > > Index >> > > > > > > > > >> > > > > > > > > >I build the secondary table "B" using a prePut >> > RegionObserver. >> > > > > > > > > >> > > > > > > > > Anil, >> > > > > > > > > In prePut hook u call HTable#put()? Why use the >> > network >> > > > > > > calls >> > > > > > > > > from server side here then? can not handle it from client >> > > > > alone? >> > > > > > > You >> > > > > > > > > can have a look at Lily project. Thoughts after seeing >> > ur >> > > > > idea on >> > > > > > > put >> > > > > > > > > and scan.. >> > > > > > > > > >> > > > > > > > > -Anoop- >> > > > > > > > > ________________________________________ >> > > > > > > > > From: anil gupta [anilgupt...@gmail.com] >> > > > > > > > > Sent: Thursday, October 25, 2012 3:10 AM >> > > > > > > > > To: user@hbase.apache.org >> > > > > > > > > Subject: Best technique for doing lookup with Secondary >> > Index >> > > > > > > > > >> > > > > > > > > Hi All, >> > > > > > > > > >> > > > > > > > > I am using HBase 0.92.1. I have created a secondary index >> > on >> > > > > table >> > > > > > > "A". >> > > > > > > > > Table A stores immutable data. I build the secondary >> > table "B" >> > > > > > > using a >> > > > > > > > > prePut RegionObserver. >> > > > > > > > > >> > > > > > > > > The secondary index is stored in table "B" as rowkey B -- >> > > >> > > > > > > > > family:<rowkey >> > > > > > > > > A> . "<rowkey A>" is the column qualifier. Every row in >> > B will >> > > > > > > only on >> > > > > > > > > have one column and the name of that column is the rowkey >> > of A. >> > > > > So >> > > > > > > the >> > > > > > > > > value is blank. As per my understanding, accessing column >> > > > > qualifier >> > > > > > > is >> > > > > > > > > faster than accessing value. Please correct me if i am >> > wrong. >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > HBase Querying approach: >> > > > > > > > > 1. Scan the secondary table by using prefix filter and >> > > > > startRow. >> > > > > > > > > 2. Do a batch get on primary table by using >> > > > > HTable.get(List<Get>) >> > > > > > > > > method. >> > > > > > > > > >> > > > > > > > > The above approach for retrieval works fine but i was >> > wondering >> > > > > it >> > > > > > > > > there is >> > > > > > > > > a better approach. I was planning to try out doing the >> > > > > retrieval >> > > > > > > using >> > > > > > > > > coprocessors. >> > > > > > > > > Have anyone tried using coprocessors? I would appreciate >> > if >> > > > > others >> > > > > > > can >> > > > > > > > > share their experience with secondary index for HBase >> > queries. >> > > > > > > > > >> > > > > > > > > -- >> > > > > > > > > Thanks & Regards, >> > > > > > > > > Anil Gupta= >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > -- >> > > > > > > Thanks & Regards, >> > > > > > > Anil Gupta >> > > > > > >> > > > > > >> > > > > >> > > > > >> > > > > -- >> > > > > Thanks & Regards, >> > > > > Anil Gupta >> > > > >> > > > >> > > >> > > >> > > -- >> > > >> > > Best Regards! >> > > >> > > Fei Ding >> > > fding.chu...@gmail.com >> > > >> >> > > > -- > Thanks & Regards, > Anil Gupta > -- Thanks & Regards, Anil Gupta