I don't agree with Lars on the second half of his statement. 

Yes, there will be a performance hit when you go across regions because you're 
now going across the network to a second machine. 
However, I disagree that it defeats the performance purpose. 

In a Hadoop cluster, we tend to launch our jobs from an edge server. However w 
HBase, you can connect to the cluster from a remote client and still run 
queries against the data outside of the traditional M/R. 

So doing something inner cluster would be less expensive than doing something 
round trip back to the client. 

In addition, there is no concept of a transaction. All put()s are atomic. So 
you write to your base table, one atomic write. You write to your index(s) 
table(s) each index update is atomic. (Assuming you may have multiple indexes 
on your base table.

Its important to remember that coprocessors are really, really new. As Andrew 
points out... its not recommended for the novice. 


On Feb 17, 2013, at 8:31 PM, lars hofhansl <la...@apache.org> wrote:

> The main advantage of coprocessors is that they keep the logic local to the 
> region server. Putting data into other region servers is supported, but 
> defeats the performance purpose.
> 
> 
> 
> ________________________________
> From: Prakash Kadel <prakash.ka...@gmail.com>
> To: "user@hbase.apache.org" <user@hbase.apache.org> 
> Sent: Sunday, February 17, 2013 5:26 PM
> Subject: Re: coprocessor enabled put very slow, help please~~~
> 
> thanks again,
>   i did try making indexes with the MR. dont have exact evaluation data, but 
> inserting indexes directly with mapreduce does seem to be much much faster 
> than making the indexes with the coprocessors. guess i am missing the point 
> about the coprosessors. 
> my reason for trying out the coprocessor was to make the insertion code 
> cleaner and efficient index creation.
> 
> Sincerely,
> Prakash Kadel
> 
> On Feb 18, 2013, at 10:17 AM, lars hofhansl <la...@apache.org> wrote:
> 
>> Index maintenance will always be slower. An interesting comparison would be 
>> to also update your indexes from the M/R and see whether that performs 
>> better.
>> 
>> 
>> 
>> ________________________________
>> From: Prakash Kadel <prakash.ka...@gmail.com>
>> To: "user@hbase.apache.org" <user@hbase.apache.org> 
>> Sent: Sunday, February 17, 2013 5:13 PM
>> Subject: Re: coprocessor enabled put very slow, help please~~~
>> 
>> thank you lars,
>> That is my guess too. I am confused, isnt that something that cannot be 
>> controlled. Is this approach of creating some kind of index wrong?
>> 
>> Sincerely,
>> Prakash Kadel
>> 
>> On Feb 18, 2013, at 10:07 AM, lars hofhansl <la...@apache.org> wrote:
>> 
>>> Presumably the coprocessor issues Puts to another region server in most 
>>> cases, that could explain it being (much) slower.
>>> 
>>> 
>>> 
>>> ________________________________
>>> From: Prakash Kadel <prakash.ka...@gmail.com>
>>> To: "user@hbase.apache.org" <user@hbase.apache.org> 
>>> Sent: Sunday, February 17, 2013 4:52 PM
>>> Subject: Re: coprocessor enabled put very slow, help please~~~
>>> 
>>> Forgot to mention. I am using 0.92.
>>> 
>>> Sincerely,
>>> Prakash
>>> 
>>> On Feb 18, 2013, at 9:48 AM, Prakash Kadel <prakash.ka...@gmail.com> wrote:
>>> 
>>>> hi,
>>>>      i am trying to insert few million documents to hbase with mapreduce. 
>>>> To enable quick search of docs i want to have some indexes, so i tried to 
>>>> use the coprocessors, but they are slowing down my inserts. Arent the 
>>>> coprocessors not supposed to increase the latency? 
>>>> my settings:
>>>>       3 region servers
>>>>      60 maps
>>>> each map inserts to doc table.(checkAndPut)
>>>> regionobserver coprocessor does a postCheckAndPut and inserts some rows to 
>>>> a index table.
>>>> 
>>>> 
>>>> Sincerely,
>>>> Prakash

Michael Segel  | (m) 312.755.9623

Segel and Associates


Reply via email to