Re: How to set the max mappers per node on a per-job basis?

2014-06-11 Thread jeremy p
way to control the parallel execution on a per-node basis. Scheduler configurations will let you control overall parallelism (# of simultaneous tasks) of specific jobs on a cluster-level basis, but not on a per-node level. On Sat, May 31, 2014 at 4:08 AM, jeremy p athomewithagroove

How to set the max mappers per node on a per-job basis?

2014-05-30 Thread jeremy p
Hello all, I have two jobs, Job A and Job B. Job A is not very CPU-intensive, and so we would like to run it with 50 mappers per node. Job B is very CPU-intensive, and so we would like to run it with 25 mappers per node. How can we request a different number of mappers per node for each job?

How to set the max mappers per node on a per-job basis?

2014-05-30 Thread jeremy p
Hello all, I have two jobs, Job A and Job B. Job A is not very CPU-intensive, and so we would like to run it with 50 mappers per node. Job B is very CPU-intensive, and so we would like to run it with 25 mappers per node. How can we request a different number of mappers per node for each job?

Re: Are mapper classes re-instantiated for each record?

2014-05-07 Thread jeremy p
files with 1 record each it can call setup/cleanup 5 times. But if your records are in single file I think that setup/cleanup should be called once. -- Thanks, Sergey On 06/05/14 02:49, jeremy p wrote: Let's say I have TaskTracker that receives 5 records to process for a single job. When

Are mapper classes re-instantiated for each record?

2014-05-05 Thread jeremy p
Let's say I have TaskTracker that receives 5 records to process for a single job. When the TaskTracker processses the first record, it will instantiate my Mapper class and execute my setup() function. It will then run the map() method on that record. My question is this : what happens when the

Re: What happens when you have fewer input files than mapper slots?

2013-03-22 Thread jeremy p
Apologies -- I don't understand this advice : If the evenness is the goal you can also write your own input format that return empty locations for each split and read the small files in map task directly. How would manually reading the files into the map task help me? Hadoop would still spawn

Re: What happens when you have fewer input files than mapper slots?

2013-03-22 Thread jeremy p
Is there a way to force an even spread of data? On Fri, Mar 22, 2013 at 2:14 PM, jeremy p athomewithagroove...@gmail.comwrote: Apologies -- I don't understand this advice : If the evenness is the goal you can also write your own input format that return empty locations for each split and read

Re: Capacity Scheduler question

2013-03-22 Thread jeremy p
at 3:06 PM, Serge Blazhievsky hadoop...@gmail.comwrote: Take a look at fair scheduler it will do what you ask for Sent from my iPhone On Mar 22, 2013, at 2:48 PM, jeremy p athomewithagroove...@gmail.com wrote: I have two jobs, Job A and Job B. Job A needs to run with 18 mappers per machine

Re: Will hadoop always spread the work evenly between nodes?

2013-03-19 Thread jeremy p
. At least in MR1, you can usually force evenness by adjusting the number of map and reduce slots per node. In MR2 the slots are combined so achieving evenness will be more difficult. Jeff -- *From: *jeremy p athomewithagroove...@gmail.com *To: *user

Re: What happens when you have fewer input files than mapper slots?

2013-03-19 Thread jeremy p
, you can still run 9 more map tasks on that machine. Or, maybe your node's core count is way less than 10, in which case you might be better off setting the mapper slots to a lower value anyway. On Tue, Mar 19, 2013 at 5:18 PM, jeremy p athomewithagroove...@gmail.comwrote: Thank you