Re: Coprocessor end point vs MapReduce?

Jean-Marc Spaggiari Wed, 17 Oct 2012 18:44:38 -0700

I don't have any concern about the time it's taking. It's more about
the load it's putting on the cluster. I have other jobs that I need to
run (secondary index, data processing, etc.). So the more time this
new job is taking, the less CPU the others will have.


I tried the M/R and I really liked the way it's done. So my only
concern will really be the performance of the delete part.

That's why I'm wondering what's the best practice to move a row to
another table.

2012/10/17, Michael Segel <[email protected]>:
> If you're going to be running this weekly, I would suggest that you stick
> with the M/R job.
>
> Is there any reason why you need to be worried about the time it takes to do
> the deletes?
>
>
> On Oct 17, 2012, at 8:19 PM, Jean-Marc Spaggiari <[email protected]>
> wrote:
>
>> Hi Mike,
>>
>> I'm expecting to run the job weekly. I initially thought about using
>> end points because I found HBASE-6942 which was a good example for my
>> needs.
>>
>> I'm fine with the Put part for the Map/Reduce, but I'm not sure about
>> the delete. That's why I look at coprocessors. Then I figure that I
>> also can do the Put on the coprocessor side.
>>
>> On a M/R, can I delete the row I'm dealing with based on some criteria
>> like timestamp? If I do that, I will not do bulk deletes, but I will
>> delete the rows one by one, right? Which might be very slow.
>>
>> If in the future I want to run the job daily, might that be an issue?
>>
>> Or should I go with the initial idea of doing the Put with the M/R job
>> and the delete with HBASE-6942?
>>
>> Thanks,
>>
>> JM
>>
>>
>> 2012/10/17, Michael Segel <[email protected]>:
>>> Hi,
>>>
>>> I'm a firm believer in KISS (Keep It Simple, Stupid)
>>>
>>> The Map/Reduce (map job only) is the simplest and least prone to
>>> failure.
>>>
>>> Not sure why you would want to do this using coprocessors.
>>>
>>> How often are you running this job? It sounds like its going to be
>>> sporadic.
>>>
>>> -Mike
>>>
>>> On Oct 17, 2012, at 7:11 PM, Jean-Marc Spaggiari
>>> <[email protected]>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Can someone please help me to understand the pros and cons between
>>>> those 2 options for the following usecase?
>>>>
>>>> I need to transfer all the rows between 2 timestamps to another table.
>>>>
>>>> My first idea was to run a MapReduce to map the rows and store them on
>>>> another table, and then delete them using an end point coprocessor.
>>>> But the more I look into it, the more I think the MapReduce is not a
>>>> good idea and I should use a coprocessor instead.
>>>>
>>>> BUT... The MapReduce framework guarantee me that it will run against
>>>> all the regions. I tried to stop a regionserver while the job was
>>>> running. The region moved, and the MapReduce restarted the job from
>>>> the new location. Will the coprocessor do the same thing?
>>>>
>>>> Also, I found the webconsole for the MapReduce with the number of
>>>> jobs, the status, etc. Is there the same thing with the coprocessors?
>>>>
>>>> Are all coprocessors running at the same time on all regions, which
>>>> mean we can have 100 of them running on a regionserver at a time? Or
>>>> are they running like the MapReduce jobs based on some configured
>>>> values?
>>>>
>>>> Thanks,
>>>>
>>>> JM
>>>>
>>>
>>>
>>
>
>

Re: Coprocessor end point vs MapReduce?

Reply via email to