xisting data to /path/b/timestamp,
> and writes new data to /path/a.
> MR Job #2 (the consumer), uses latest /path/b/timestamp (or the whole
> of available set of timestamps under /path/b at that point) for its
> input, and deletes it afterwards. Hence the #2 can monitor this
> directory
Hi,
I have an HDFS folder and M/R job that periodically updates it by replacing the
data with newly generated data.
I have a different M/R job that periodically or ad-hoc process the data in the
folder.
The second job ,naturally, fails sometime, when the data is replaced by newly
generated da
It is possible
What kind of node are you using, what other processes are running? what are
the Xmx for NN and JT ?
Date: Tue, 10 Jan 2012 10:35:43 -0500
Subject: minimum time to kick off MapReduce job
From: scott.a.lind...@gmail.com
To: mapreduce-user@hadoop.apache.org
Hi,
I am experiencing sl
Is it possible though the server runs with vm.swappiness =5
> Subject: Re: Dead data nodes during job excution and failed tasks.
> From: a...@apache.org
> Date: Thu, 30 Jun 2011 11:46:25 -0700
> To: mapreduce-user@hadoop.apache.org
>
>
> On Jun 30, 2011, at 10:01 AM,
rt.mb", "350");("io.sort.factor", "100");("io.file.buffer.size",
"131072");
("mapred.child.java.opts", "-Xms1024m
-Xmx1024m");("mapred.reduce.parallel.copies",
"8");("mapred.tasktracker.map.tasks.max
Hi,
My cluster contains 22 DataNodes and Task Tracker each with 8 mapper slots and
4 reduce slots, each with 1.5G max heap size.
I use cloudera CDH 2
I have a specific job that is constantly failing in the reduce phase. I use 64
reducers and 64M block size and compress map output with LZO.
The