running multiple jobs, please help

2012-01-01 Thread Shaojun Zhao
Dear all, I have many jobs (900k) to run on many machines (4k) . All jobs are independent, particularly, they use the same algorithm, but the input is different. If I could build a single cluster with 4k machines, I can simple submit all my jobs using a shell script. Critically, the jobs will exec

Re: instantiation of classes in MR

2012-01-01 Thread Anirudh
Any specific reason why setup is called for every task attempt. For optimization point of view, wouldnt it be good if the setup is called only once in case of JVM reuse. I have not yet looked at the implementation, in case of JVM reuse is the application Mapper instance reused or a new instance is

Re: instantiation of classes in MR

2012-01-01 Thread Harsh J
You are guaranteed one setup call for every single task attempt. This is regardless of JVM reuse being on or off. JVM reuse will cause no issues with what Eyal is attempting to do. On Sun, Jan 1, 2012 at 5:49 PM, Anirudh wrote: > No problems Eyal. > > OnĀ  a second thought, for the JVM re-use the

output files written by reducers

2012-01-01 Thread aliyeh saeedi
Hi I have some questions and I would be really grateful to know the answer. As I read in hadoop tutorial "the output files written by the Reducers are then left in HDFS for user use, either by another MapReduce job, a separate program, for human inspection." 1- Does hadoop automatically use t

Re: instantiation of classes in MR

2012-01-01 Thread Anirudh
No problems Eyal. On a second thought, for the JVM re-use the Mapper/Reducer instances should be re-used, and the setup should be called only once. This makes sense too as the JVM reuse is for the same job. You should be good with class instantiation even if the JVM reuse is enabled. On Sat, Dec