I know that it is not possible to suspend and resume mapreduce job, but I really need to find a workaround. I have looked to the ChainedJobs and to the CapacityScheduler, but I am really clueless on what to do.
The main goal was to suspend a job when the map tasks finish and the reduce tasks start. I know that this is not possible, so I have created to jobs. One that execute all the map tasks (Job 1), and another job that execute all the reduce tasks (Job 2). Since I can't start a job with just running reduce tasks, it was necessary to add an identity mapper before running the reducers. So in the end, I have Job 1 that just executes all map tasks, and job 2 that executes the identity mappers and the reduce tasks. But this really kills performance. I wish I could find a way to obtain better performance. I have thought in doing pipe of the output of Job 1 to Job 2, but in the end I really need to stop the execution between these 2 jobs. I have looked to the ChainedJobs and CapacityScheduler classes to see if I could implement a way to suspend and resume a job, but I didn't do nothing successfully. Any idea to emulate a way to suspend a job? Sorry to say this, but I am really desperate in finding a solution. Thanks, On Wed, Feb 18, 2015 at 6:53 PM, Steve Loughran <ste...@hortonworks.com> wrote: > Afraid not. > > When we suspend/resume a slider application, what we are doing is shutting > down the entire application, releasing all its YARN resources and killing > the "Application Master". The MapReduce engine runs its AM for the > duration of the job; building up lots of state in that AM as to what is > happening. Tez runs for longer, but it can dynamically change cluster size > based on load. > > "Hadoop pre-emption" is a mechanism by which your cluster can be set up so > that higher priority workloads can cause containers of lower-priority jobs > to get killed, "pre-empted". Maybe that could be useful. > > -Steve > > > > On 18 February 2015 at 17:22:57, xeonmailinglist ( > xeonmailingl...@gmail.com<mailto:xeonmailingl...@gmail.com>) wrote: > > Hi, > > I noticed that YARN does not suspend or resume a mapreduce job that it > is executing. Then, I have found Apache Slider. > Is it possible to submit a mapreduce job with slider, and suspend and > resume the job while executing? > > Thanks, > >