Dear all,
I have many jobs (900k) to run on many machines (4k) . All jobs are
independent, particularly, they use the same algorithm, but the input
is different. If I could build a single cluster with 4k machines, I
can simple submit all my jobs using a shell script. Critically, the
jobs will exec
Any specific reason why setup is called for every task attempt. For
optimization point of view, wouldnt it be good if the setup is called only
once in case of JVM reuse.
I have not yet looked at the implementation, in case of JVM reuse is the
application Mapper instance reused or a new instance is
You are guaranteed one setup call for every single task attempt. This
is regardless of JVM reuse being on or off. JVM reuse will cause no
issues with what Eyal is attempting to do.
On Sun, Jan 1, 2012 at 5:49 PM, Anirudh wrote:
> No problems Eyal.
>
> OnĀ a second thought, for the JVM re-use the
Hi
I have some questions and I would be really grateful to know the answer.
As I read in hadoop tutorial "the output files written by the Reducers are then
left in HDFS for user use, either by another
MapReduce job, a separate program, for human inspection."
1- Does hadoop automatically use t
No problems Eyal.
On a second thought, for the JVM re-use the Mapper/Reducer instances
should be re-used, and the setup should be called only once. This makes
sense too as the JVM reuse is for the same job.
You should be good with class instantiation even if the JVM reuse is
enabled.
On Sat, Dec