To echo what Mike said about KISS, would you use triggers for a large time-sensitive batch job in an RDBMS? It's possible, but probably not. Then you might want to think twice about using co-processors for such a purpose with HBase.
On 10/17/12 9:50 PM, "Michael Segel" <michael_se...@hotmail.com> wrote: >Run your weekly job in a low priority fair scheduler/capacity scheduler >queue. > >Maybe its just me, but I look at Coprocessors as a similar structure to >RDBMS triggers and stored procedures. >You need to restrain and use them sparingly otherwise you end up creating >performance issues. > >Just IMHO. > >-Mike > >On Oct 17, 2012, at 8:44 PM, Jean-Marc Spaggiari ><jean-m...@spaggiari.org> wrote: > >> I don't have any concern about the time it's taking. It's more about >> the load it's putting on the cluster. I have other jobs that I need to >> run (secondary index, data processing, etc.). So the more time this >> new job is taking, the less CPU the others will have. >> >> I tried the M/R and I really liked the way it's done. So my only >> concern will really be the performance of the delete part. >> >> That's why I'm wondering what's the best practice to move a row to >> another table. >> >> 2012/10/17, Michael Segel <michael_se...@hotmail.com>: >>> If you're going to be running this weekly, I would suggest that you >>>stick >>> with the M/R job. >>> >>> Is there any reason why you need to be worried about the time it takes >>>to do >>> the deletes? >>> >>> >>> On Oct 17, 2012, at 8:19 PM, Jean-Marc Spaggiari >>><jean-m...@spaggiari.org> >>> wrote: >>> >>>> Hi Mike, >>>> >>>> I'm expecting to run the job weekly. I initially thought about using >>>> end points because I found HBASE-6942 which was a good example for my >>>> needs. >>>> >>>> I'm fine with the Put part for the Map/Reduce, but I'm not sure about >>>> the delete. That's why I look at coprocessors. Then I figure that I >>>> also can do the Put on the coprocessor side. >>>> >>>> On a M/R, can I delete the row I'm dealing with based on some criteria >>>> like timestamp? If I do that, I will not do bulk deletes, but I will >>>> delete the rows one by one, right? Which might be very slow. >>>> >>>> If in the future I want to run the job daily, might that be an issue? >>>> >>>> Or should I go with the initial idea of doing the Put with the M/R job >>>> and the delete with HBASE-6942? >>>> >>>> Thanks, >>>> >>>> JM >>>> >>>> >>>> 2012/10/17, Michael Segel <michael_se...@hotmail.com>: >>>>> Hi, >>>>> >>>>> I'm a firm believer in KISS (Keep It Simple, Stupid) >>>>> >>>>> The Map/Reduce (map job only) is the simplest and least prone to >>>>> failure. >>>>> >>>>> Not sure why you would want to do this using coprocessors. >>>>> >>>>> How often are you running this job? It sounds like its going to be >>>>> sporadic. >>>>> >>>>> -Mike >>>>> >>>>> On Oct 17, 2012, at 7:11 PM, Jean-Marc Spaggiari >>>>> <jean-m...@spaggiari.org> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> Can someone please help me to understand the pros and cons between >>>>>> those 2 options for the following usecase? >>>>>> >>>>>> I need to transfer all the rows between 2 timestamps to another >>>>>>table. >>>>>> >>>>>> My first idea was to run a MapReduce to map the rows and store them >>>>>>on >>>>>> another table, and then delete them using an end point coprocessor. >>>>>> But the more I look into it, the more I think the MapReduce is not a >>>>>> good idea and I should use a coprocessor instead. >>>>>> >>>>>> BUT... The MapReduce framework guarantee me that it will run against >>>>>> all the regions. I tried to stop a regionserver while the job was >>>>>> running. The region moved, and the MapReduce restarted the job from >>>>>> the new location. Will the coprocessor do the same thing? >>>>>> >>>>>> Also, I found the webconsole for the MapReduce with the number of >>>>>> jobs, the status, etc. Is there the same thing with the >>>>>>coprocessors? >>>>>> >>>>>> Are all coprocessors running at the same time on all regions, which >>>>>> mean we can have 100 of them running on a regionserver at a time? Or >>>>>> are they running like the MapReduce jobs based on some configured >>>>>> values? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> JM >>>>>> >>>>> >>>>> >>>> >>> >>> >> > >