Hey folks, for the record there are samples of using importsv for preparing Hfiles in here...
http://hbase.apache.org/book.html#importtsv On 10/26/12 12:44 AM, "anil gupta" <anilgupt...@gmail.com> wrote: >Hi Anoop, > >Yes i use bulk loading for loading table A. I wrote my own mapper as >Importtsv wont suffice my requirements. :) No, i dont call HTable#put() >from my mapper. I was thinking about trying out calling HTable#put() from >my mapper and see the outcome. > > I meant to say that when we use MR job (ex. importtsv) then WAL is not >used. Sorry, if i misunderstood someone. > >Thanks, >Anil > >On Thu, Oct 25, 2012 at 9:06 PM, Anoop Sam John <anoo...@huawei.com> >wrote: > >> Hi Anil, >> Some confusion after seeing your reply. >> You use bulk loading? You created your own mapper? You call >>HTable#put() >> from mappers? >> >> I think confusion in another thread also.. I was refering to the >> HFileOutputReducer.. There is a TableOutputFormat also... In >> TableOutputFormat it will try put to the HTable... Here write to WAL is >> applicable... >> >> >> [HFileOutputReducer] : As we discussed in another thread, in case of >>bulk >> loading the aproach is like MR job create KVs and write to files and >>this >> file is written as an HFile. Yes this will contain all meta information, >> trailer etc... Finally only HBase cluster need to be contacted just to >>load >> this HFile(s) into HBase cluster.. Under corresponding regions. This >>will >> be the fastest way for bulk loading of huge data... >> >> >> -Anoop- >> ________________________________________ >> From: anil gupta [anilgupt...@gmail.com] >> Sent: Friday, October 26, 2012 3:40 AM >> To: user@hbase.apache.org >> Subject: Re: Best technique for doing lookup with Secondary Index >> >> Anoop: In prePut hook u call HTable#put()? >> Anil: Yes i call HTable#put() in prePut. Is there better way of doing >>it? >> >> Anoop: Why use the network calls from server side here then? >> Anil: I thought this is a cleaner approach since i am using BulkLoader. >>I >> decided not to run two jobs since i am generating a UniqueIdentifier at >> runtime in bulkloader. >> >> Anoop: can not handle it from client alone? >> Anil: I cannot handle it from client since i am using BulkLoader. Is it >>a >> good idea to create Htable instance on "B" and do put in my mapper? I >>might >> try this idea. >> >> Anoop: You can have a look at Lily project. >> Anil: It's little late for us to evaluate Lily now and at present we >>dont >> need complex secondary index since our data is immutable. >> >> Ram: what is rowkey B here? >> Anil: Suppose i am storing customer events in table A. I have two >> requirement for data query: >> 1. Query customer events on basis of customer_Id and event_ID. >> 2. Query customer events on basis of event_timestamp and customer_ID. >> >> 70% of querying is done by query#1, so i will create >> <customer_Id><event_ID> as row key of Table A. >> Now, in order to support fast results for query#2, i need to create a >> secondary index on A. I store that secondary index in B, rowkey of B is >> <event_timestamp><customer_ID> .Every row stores the corresponding >>rowkey >> of A. >> >> Ram:How is the startRow determined for every query? >> Anil: Its determined by a very simple application logic. >> >> Thanks, >> Anil Gupta >> >> On Wed, Oct 24, 2012 at 10:16 PM, Ramkrishna.S.Vasudevan < >> ramkrishna.vasude...@huawei.com> wrote: >> >> > Just out of curiosity, >> > > The secondary index is stored in table "B" as rowkey B --> >> > > family:<rowkey >> > > A> >> > what is rowkey B here? >> > > 1. Scan the secondary table by using prefix filter and startRow. >> > How is the startRow determined for every query ? >> > >> > Regards >> > Ram >> > >> > > -----Original Message----- >> > > From: Anoop Sam John [mailto:anoo...@huawei.com] >> > > Sent: Thursday, October 25, 2012 10:15 AM >> > > To: user@hbase.apache.org >> > > Subject: RE: Best technique for doing lookup with Secondary Index >> > > >> > > >I build the secondary table "B" using a prePut RegionObserver. >> > > >> > > Anil, >> > > In prePut hook u call HTable#put()? Why use the network >>calls >> > > from server side here then? can not handle it from client alone? You >> > > can have a look at Lily project. Thoughts after seeing ur idea on >>put >> > > and scan.. >> > > >> > > -Anoop- >> > > ________________________________________ >> > > From: anil gupta [anilgupt...@gmail.com] >> > > Sent: Thursday, October 25, 2012 3:10 AM >> > > To: user@hbase.apache.org >> > > Subject: Best technique for doing lookup with Secondary Index >> > > >> > > Hi All, >> > > >> > > I am using HBase 0.92.1. I have created a secondary index on table >>"A". >> > > Table A stores immutable data. I build the secondary table "B" >>using a >> > > prePut RegionObserver. >> > > >> > > The secondary index is stored in table "B" as rowkey B --> >> > > family:<rowkey >> > > A> . "<rowkey A>" is the column qualifier. Every row in B will >>only on >> > > have one column and the name of that column is the rowkey of A. So >>the >> > > value is blank. As per my understanding, accessing column qualifier >>is >> > > faster than accessing value. Please correct me if i am wrong. >> > > >> > > >> > > HBase Querying approach: >> > > 1. Scan the secondary table by using prefix filter and startRow. >> > > 2. Do a batch get on primary table by using HTable.get(List<Get>) >> > > method. >> > > >> > > The above approach for retrieval works fine but i was wondering it >> > > there is >> > > a better approach. I was planning to try out doing the retrieval >>using >> > > coprocessors. >> > > Have anyone tried using coprocessors? I would appreciate if others >>can >> > > share their experience with secondary index for HBase queries. >> > > >> > > -- >> > > Thanks & Regards, >> > > Anil Gupta= >> > >> > >> >> >> -- >> Thanks & Regards, >> Anil Gupta >> > > > >-- >Thanks & Regards, >Anil Gupta