Well it also goes back to the question of how the RO is writing to the second 
table. 

I would imagine that if the M/R uses Mapper.setup() to instantiate the HTable 
for the index write  and then in Mapper.map() writes to the index table, why 
would the co-processor take much more time?

I think a code review would be in order.


On Feb 18, 2013, at 6:22 AM, yonghu <yongyong...@gmail.com> wrote:

> Ok. Now, I got your point. I didn't notice the "checkAndPut".
> 
> regards!
> 
> Yong
> 
> On Mon, Feb 18, 2013 at 1:11 PM, Michael Segel
> <michael_se...@hotmail.com> wrote:
>> 
>> The  issue I was talking about was the use of a check and put.
>> The OP wrote:
>>>>>> each map inserts to doc table.(checkAndPut)
>>>>>> regionobserver coprocessor does a postCheckAndPut and inserts some rows 
>>>>>> to
>>>>>> a index table.
>> 
>> My question is why does the OP use a checkAndPut, and the RegionObserver's 
>> postChecAndPut?
>> 
>> 
>> Here's a good example... 
>> http://stackoverflow.com/questions/13404447/is-hbase-checkandput-latency-higher-than-simple-put
>> 
>> The OP doesn't really get in to the use case, so we don't know why the Check 
>> and Put in the M/R job.
>> He should just be using put() and then a postPut().
>> 
>> Another issue... since he's writing to  a different HTable... how? Does he 
>> create an HTable instance in the start() method of his RO object and then 
>> reference it later? Or does he create the instance of the HTable on the fly 
>> in each postCheckAndPut() ?
>> Without seeing his code, we don't know.
>> 
>> Note that this is synchronous set of writes. Your overall return from the 
>> M/R call to put will wait until the second row is inserted.
>> 
>> Interestingly enough, you may want to consider disabling the WAL on the 
>> write to the index.  You can always run a M/R job that rebuilds the index 
>> should something occur to the system where you might lose the data.  Indexes 
>> *ARE* expendable. ;-)
>> 
>> Does that explain it?
>> 
>> -Mike
>> 
>> On Feb 18, 2013, at 4:57 AM, yonghu <yongyong...@gmail.com> wrote:
>> 
>>> Hi, Michael
>>> 
>>> I don't quite understand what do you mean by "round trip back to the
>>> client". In my understanding, as the RegionServer and TaskTracker can
>>> be the same node, MR don't have to pull data into client and then
>>> process.  And you also mention the "unnecessary overhead", can you
>>> explain a little bit what operations or data processing can be seen as
>>> "unnecessary overhead".
>>> 
>>> Thanks
>>> 
>>> yong
>>> On Mon, Feb 18, 2013 at 10:35 AM, Michael Segel
>>> <michael_se...@hotmail.com> wrote:
>>>> Why?
>>>> 
>>>> This seems like an unnecessary overhead.
>>>> 
>>>> You are writing code within the coprocessor on the server.  Pessimistic 
>>>> code really isn't recommended if you are worried about performance.
>>>> 
>>>> I have to ask... by the time you have executed the code in your 
>>>> co-processor, what would cause the initial write to fail?
>>>> 
>>>> 
>>>> On Feb 18, 2013, at 3:01 AM, Prakash Kadel <prakash.ka...@gmail.com> wrote:
>>>> 
>>>>> its a local read. i just check the last param of PostCheckAndPut 
>>>>> indicating if the Put succeeded. Incase if the put success, i insert a 
>>>>> row in another table
>>>>> 
>>>>> Sincerely,
>>>>> Prakash Kadel
>>>>> 
>>>>> On Feb 18, 2013, at 2:52 PM, Wei Tan <w...@us.ibm.com> wrote:
>>>>> 
>>>>>> Is your CheckAndPut involving a local or remote READ? Due to the nature 
>>>>>> of
>>>>>> LSM, read is much slower compared to a write...
>>>>>> 
>>>>>> 
>>>>>> Best Regards,
>>>>>> Wei
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> From:   Prakash Kadel <prakash.ka...@gmail.com>
>>>>>> To:     "user@hbase.apache.org" <user@hbase.apache.org>,
>>>>>> Date:   02/17/2013 07:49 PM
>>>>>> Subject:        coprocessor enabled put very slow, help please~~~
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> hi,
>>>>>> i am trying to insert few million documents to hbase with mapreduce. To
>>>>>> enable quick search of docs i want to have some indexes, so i tried to 
>>>>>> use
>>>>>> the coprocessors, but they are slowing down my inserts. Arent the
>>>>>> coprocessors not supposed to increase the latency?
>>>>>> my settings:
>>>>>> 3 region servers
>>>>>> 60 maps
>>>>>> each map inserts to doc table.(checkAndPut)
>>>>>> regionobserver coprocessor does a postCheckAndPut and inserts some rows 
>>>>>> to
>>>>>> a index table.
>>>>>> 
>>>>>> 
>>>>>> Sincerely,
>>>>>> Prakash
>>>>>> 
>>>>> 
>>>> 
>>>> Michael Segel  | (m) 312.755.9623
>>>> 
>>>> Segel and Associates
>>>> 
>>>> 
>>> 
>> 
> 

Reply via email to