Hey folks, for the record there are samples of using importsv for
preparing Hfiles in here...

http://hbase.apache.org/book.html#importtsv






On 10/26/12 12:44 AM, "anil gupta" <anilgupt...@gmail.com> wrote:

>Hi Anoop,
>
>Yes i use bulk loading for loading table A. I wrote my own mapper as
>Importtsv wont suffice my requirements. :) No, i dont call HTable#put()
>from my mapper. I was thinking about trying out calling HTable#put() from
>my mapper and see the outcome.
>
> I meant to say that when we use MR job (ex. importtsv) then WAL is not
>used. Sorry, if i misunderstood someone.
>
>Thanks,
>Anil
>
>On Thu, Oct 25, 2012 at 9:06 PM, Anoop Sam John <anoo...@huawei.com>
>wrote:
>
>> Hi Anil,
>>               Some confusion after seeing your reply.
>> You use bulk loading?  You created your own mapper?  You call
>>HTable#put()
>> from mappers?
>>
>> I think confusion in another thread also..  I was refering to the
>> HFileOutputReducer.. There is a TableOutputFormat also... In
>> TableOutputFormat it will try put to the HTable...  Here write to WAL is
>> applicable...
>>
>>
>> [HFileOutputReducer] : As we discussed in another thread, in case of
>>bulk
>> loading the aproach is like MR job create KVs and write to files and
>>this
>> file is written as an HFile. Yes this will contain all meta information,
>> trailer etc... Finally only HBase cluster need to be contacted just to
>>load
>> this HFile(s) into HBase cluster.. Under corresponding regions.  This
>>will
>> be the fastest way for bulk loading of huge data...
>>
>>
>> -Anoop-
>> ________________________________________
>> From: anil gupta [anilgupt...@gmail.com]
>> Sent: Friday, October 26, 2012 3:40 AM
>> To: user@hbase.apache.org
>> Subject: Re: Best technique for doing lookup with Secondary Index
>>
>> Anoop:  In prePut hook u call HTable#put()?
>> Anil: Yes i call HTable#put() in prePut. Is there better way of doing
>>it?
>>
>> Anoop: Why use the network calls from server side here then?
>> Anil: I thought this is a cleaner approach since i am using BulkLoader.
>>I
>> decided not to run two jobs since i am generating a UniqueIdentifier at
>> runtime in bulkloader.
>>
>> Anoop: can not handle it from client alone?
>> Anil: I cannot handle it from client since i am using BulkLoader. Is it
>>a
>> good idea to create Htable instance on "B" and do put in my mapper? I
>>might
>> try this idea.
>>
>> Anoop: You can have a look at Lily project.
>> Anil: It's little late for us to evaluate Lily now and at present we
>>dont
>> need complex secondary index since our data is immutable.
>>
>> Ram: what is rowkey B here?
>> Anil: Suppose i am storing customer events in table A. I have two
>> requirement for data query:
>> 1. Query customer events on basis of customer_Id and event_ID.
>> 2. Query customer events on basis of event_timestamp and customer_ID.
>>
>> 70% of querying is done by query#1, so i will create
>> <customer_Id><event_ID> as row key of Table A.
>> Now, in order to support fast results for query#2, i need to create a
>> secondary index on A. I store that secondary index in B, rowkey of B is
>> <event_timestamp><customer_ID>  .Every row stores the corresponding
>>rowkey
>> of A.
>>
>> Ram:How is the startRow determined for every query?
>> Anil: Its determined by a very simple application logic.
>>
>> Thanks,
>> Anil Gupta
>>
>> On Wed, Oct 24, 2012 at 10:16 PM, Ramkrishna.S.Vasudevan <
>> ramkrishna.vasude...@huawei.com> wrote:
>>
>> > Just out of curiosity,
>> > > The secondary index is stored in table "B" as rowkey B -->
>> > > family:<rowkey
>> > > A>
>> > what is rowkey B here?
>> > > 1. Scan the secondary table by using prefix filter and startRow.
>> > How is the startRow determined for every query ?
>> >
>> > Regards
>> > Ram
>> >
>> > > -----Original Message-----
>> > > From: Anoop Sam John [mailto:anoo...@huawei.com]
>> > > Sent: Thursday, October 25, 2012 10:15 AM
>> > > To: user@hbase.apache.org
>> > > Subject: RE: Best technique for doing lookup with Secondary Index
>> > >
>> > > >I build the secondary table "B" using a prePut RegionObserver.
>> > >
>> > > Anil,
>> > >        In prePut hook u call HTable#put()?  Why use the network
>>calls
>> > > from server side here then? can not handle it from client alone? You
>> > > can have a look at Lily project.   Thoughts after seeing ur idea on
>>put
>> > > and scan..
>> > >
>> > > -Anoop-
>> > > ________________________________________
>> > > From: anil gupta [anilgupt...@gmail.com]
>> > > Sent: Thursday, October 25, 2012 3:10 AM
>> > > To: user@hbase.apache.org
>> > > Subject: Best technique for doing lookup with Secondary Index
>> > >
>> > > Hi All,
>> > >
>> > > I am using HBase 0.92.1. I have created a secondary index on table
>>"A".
>> > > Table A stores immutable data. I build the secondary table "B"
>>using a
>> > > prePut RegionObserver.
>> > >
>> > > The secondary index is stored in table "B" as rowkey B -->
>> > > family:<rowkey
>> > > A>  . "<rowkey A>" is the column qualifier. Every row in B will
>>only on
>> > > have one column and the name of that column is the rowkey of A. So
>>the
>> > > value is blank. As per my understanding, accessing column qualifier
>>is
>> > > faster than accessing value. Please correct me if i am wrong.
>> > >
>> > >
>> > > HBase Querying approach:
>> > > 1. Scan the secondary table by using prefix filter and startRow.
>> > > 2. Do a batch get on primary table by using HTable.get(List<Get>)
>> > > method.
>> > >
>> > > The above approach for retrieval works fine but i was wondering it
>> > > there is
>> > > a better approach. I was planning to try out doing the retrieval
>>using
>> > > coprocessors.
>> > > Have anyone tried using coprocessors? I would appreciate if others
>>can
>> > > share their experience with secondary index for HBase queries.
>> > >
>> > > --
>> > > Thanks & Regards,
>> > > Anil Gupta=
>> >
>> >
>>
>>
>> --
>> Thanks & Regards,
>> Anil Gupta
>>
>
>
>
>-- 
>Thanks & Regards,
>Anil Gupta


Reply via email to