Re: Coprocessor end point vs MapReduce?

Doug Meil Thu, 18 Oct 2012 05:36:49 -0700

To echo what Mike said about KISS, would you use triggers for a large
time-sensitive batch job in an RDBMS?  It's possible, but probably not.
Then you might want to think twice about using co-processors for such a
purpose with HBase.






On 10/17/12 9:50 PM, "Michael Segel" <michael_se...@hotmail.com> wrote:

>Run your weekly job in a low priority fair scheduler/capacity scheduler
>queue. 
>
>Maybe its just me, but I look at Coprocessors as a similar structure to
>RDBMS triggers and stored procedures.
>You need to restrain and use them sparingly otherwise you end up creating
>performance issues.
>
>Just IMHO.
>
>-Mike
>
>On Oct 17, 2012, at 8:44 PM, Jean-Marc Spaggiari
><jean-m...@spaggiari.org> wrote:
>
>> I don't have any concern about the time it's taking. It's more about
>> the load it's putting on the cluster. I have other jobs that I need to
>> run (secondary index, data processing, etc.). So the more time this
>> new job is taking, the less CPU the others will have.
>> 
>> I tried the M/R and I really liked the way it's done. So my only
>> concern will really be the performance of the delete part.
>> 
>> That's why I'm wondering what's the best practice to move a row to
>> another table.
>> 
>> 2012/10/17, Michael Segel <michael_se...@hotmail.com>:
>>> If you're going to be running this weekly, I would suggest that you
>>>stick
>>> with the M/R job.
>>> 
>>> Is there any reason why you need to be worried about the time it takes
>>>to do
>>> the deletes?
>>> 
>>> 
>>> On Oct 17, 2012, at 8:19 PM, Jean-Marc Spaggiari
>>><jean-m...@spaggiari.org>
>>> wrote:
>>> 
>>>> Hi Mike,
>>>> 
>>>> I'm expecting to run the job weekly. I initially thought about using
>>>> end points because I found HBASE-6942 which was a good example for my
>>>> needs.
>>>> 
>>>> I'm fine with the Put part for the Map/Reduce, but I'm not sure about
>>>> the delete. That's why I look at coprocessors. Then I figure that I
>>>> also can do the Put on the coprocessor side.
>>>> 
>>>> On a M/R, can I delete the row I'm dealing with based on some criteria
>>>> like timestamp? If I do that, I will not do bulk deletes, but I will
>>>> delete the rows one by one, right? Which might be very slow.
>>>> 
>>>> If in the future I want to run the job daily, might that be an issue?
>>>> 
>>>> Or should I go with the initial idea of doing the Put with the M/R job
>>>> and the delete with HBASE-6942?
>>>> 
>>>> Thanks,
>>>> 
>>>> JM
>>>> 
>>>> 
>>>> 2012/10/17, Michael Segel <michael_se...@hotmail.com>:
>>>>> Hi,
>>>>> 
>>>>> I'm a firm believer in KISS (Keep It Simple, Stupid)
>>>>> 
>>>>> The Map/Reduce (map job only) is the simplest and least prone to
>>>>> failure.
>>>>> 
>>>>> Not sure why you would want to do this using coprocessors.
>>>>> 
>>>>> How often are you running this job? It sounds like its going to be
>>>>> sporadic.
>>>>> 
>>>>> -Mike
>>>>> 
>>>>> On Oct 17, 2012, at 7:11 PM, Jean-Marc Spaggiari
>>>>> <jean-m...@spaggiari.org>
>>>>> wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> Can someone please help me to understand the pros and cons between
>>>>>> those 2 options for the following usecase?
>>>>>> 
>>>>>> I need to transfer all the rows between 2 timestamps to another
>>>>>>table.
>>>>>> 
>>>>>> My first idea was to run a MapReduce to map the rows and store them
>>>>>>on
>>>>>> another table, and then delete them using an end point coprocessor.
>>>>>> But the more I look into it, the more I think the MapReduce is not a
>>>>>> good idea and I should use a coprocessor instead.
>>>>>> 
>>>>>> BUT... The MapReduce framework guarantee me that it will run against
>>>>>> all the regions. I tried to stop a regionserver while the job was
>>>>>> running. The region moved, and the MapReduce restarted the job from
>>>>>> the new location. Will the coprocessor do the same thing?
>>>>>> 
>>>>>> Also, I found the webconsole for the MapReduce with the number of
>>>>>> jobs, the status, etc. Is there the same thing with the
>>>>>>coprocessors?
>>>>>> 
>>>>>> Are all coprocessors running at the same time on all regions, which
>>>>>> mean we can have 100 of them running on a regionserver at a time? Or
>>>>>> are they running like the MapReduce jobs based on some configured
>>>>>> values?
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> JM
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>> 
>
>

Re: Coprocessor end point vs MapReduce?

Reply via email to