Hi Jyothish,
I had exactly the same problem and I solved it. To answer your question:
as for me, HDFS and NFS are totally incompatible;) However, you may
configure MadReduce to run on NFS only, without HDFS. See the last but
one post here:
http://old.nabble.com/Hadoop-over-Lustre--td19092864.
Jyothish,
as far as i know it is not recommended to run Hadoop on NFS, you suppose to
use use local volumes for all mapred and dfs directories
Alex
On Mon, May 10, 2010 at 2:00 PM, Jyothish Soman wrote:
> I have a distributed system on NFS, and wanted to use MapReduce on it, but
> the system ke
I have a distributed system on NFS, and wanted to use MapReduce on it, but
the system keeps spawning errors related to inability to allocate temporary
space.
Though sufficient is available, hence my question.
Is HDFS and NFS compatible?.
Thanks a lot. I was able to use MultipleOutputs to get CLOSER to what I want.
i.e. using the changes mentioned below I'm able to generate multiple output
files like:
/test/out/2010-04-19_morning.txt-r-0
/test/out/2010-04-19_afternoon.txt-r-0
/test/out/2010-04-20
Hi Alan,
On Mon, May 10, 2010 at 5:08 AM, Some Body wrote:
> Hi,
>
> I'm trying to understand how to generate multiple outputs in my reducer
> (using 0.20.2+228).
> Do I need MultipleOutput or should I partition my output in the mapper?
>
>
The question is scalability. If you are OK with runnin
Hi Karl,
Even though approach 1 is possible, it's not scalable. As far as I know
Hadoop reducer will run out of memory if you merge big files (I am not sure
it's a 'bug' or a 'limitation', but it was designed this way). In practice,
you are likely to run into other problems like accessibility an
Hi all,
I'm doing some evaluation using a vanilla 20.2 release on a small
cluster to sort large data sets. I've looked at the terasort work, but
in my particular case I'm more interested in outputting a single file,
than I am in performance. For testing I'm sorting about 200G worth of
data, and
Hi Alan,
You can use MultipleOutputFormat. You can override the
generateFileName...methods to get the functionality you want.
A partitioner controls how data moves from the mapper to the reducer, so if
you take that approach, you will have to specify the number of reducers as
the number of files
Hi,
I'm trying to understand how to generate multiple outputs in my reducer (using
0.20.2+228).
Do I need MultipleOutput or should I partition my output in the mapper?
My reducer currently gets key/val input pairs like this which all end up in my
part_r_ file.
hostA_VarX_2010-05-01_mor