Re: Is MapReduce NFS friendly

2010-05-10 Thread Marcin Sieniek
Hi Jyothish, I had exactly the same problem and I solved it. To answer your question: as for me, HDFS and NFS are totally incompatible;) However, you may configure MadReduce to run on NFS only, without HDFS. See the last but one post here: http://old.nabble.com/Hadoop-over-Lustre--td19092864.

Re: Is MapReduce NFS friendly

2010-05-10 Thread alex kamil
Jyothish, as far as i know it is not recommended to run Hadoop on NFS, you suppose to use use local volumes for all mapred and dfs directories Alex On Mon, May 10, 2010 at 2:00 PM, Jyothish Soman wrote: > I have a distributed system on NFS, and wanted to use MapReduce on it, but > the system ke

Is MapReduce NFS friendly

2010-05-10 Thread Jyothish Soman
I have a distributed system on NFS, and wanted to use MapReduce on it, but the system keeps spawning errors related to inability to allocate temporary space. Though sufficient is available, hence my question. Is HDFS and NFS compatible?.

Re: MultipleOutputs or Partitioner

2010-05-10 Thread Some Body
Thanks a lot. I was able to use MultipleOutputs to get CLOSER to what I want. i.e. using the changes mentioned below I'm able to generate multiple output files like: /test/out/2010-04-19_morning.txt-r-0 /test/out/2010-04-19_afternoon.txt-r-0 /test/out/2010-04-20

Re: MultipleOutputs or Partitioner

2010-05-10 Thread Alex Kozlov
Hi Alan, On Mon, May 10, 2010 at 5:08 AM, Some Body wrote: > Hi, > > I'm trying to understand how to generate multiple outputs in my reducer > (using 0.20.2+228). > Do I need MultipleOutput or should I partition my output in the mapper? > > The question is scalability. If you are OK with runnin

Re: sorting to a single output

2010-05-10 Thread Alex Kozlov
Hi Karl, Even though approach 1 is possible, it's not scalable. As far as I know Hadoop reducer will run out of memory if you merge big files (I am not sure it's a 'bug' or a 'limitation', but it was designed this way). In practice, you are likely to run into other problems like accessibility an

sorting to a single output

2010-05-10 Thread Karl Kuntz
Hi all, I'm doing some evaluation using a vanilla 20.2 release on a small cluster to sort large data sets. I've looked at the terasort work, but in my particular case I'm more interested in outputting a single file, than I am in performance. For testing I'm sorting about 200G worth of data, and

Re: MultipleOutputs or Partitioner

2010-05-10 Thread Sonal Goyal
Hi Alan, You can use MultipleOutputFormat. You can override the generateFileName...methods to get the functionality you want. A partitioner controls how data moves from the mapper to the reducer, so if you take that approach, you will have to specify the number of reducers as the number of files

MultipleOutputs or Partitioner

2010-05-10 Thread Some Body
Hi, I'm trying to understand how to generate multiple outputs in my reducer (using 0.20.2+228). Do I need MultipleOutput or should I partition my output in the mapper? My reducer currently gets key/val input pairs like this which all end up in my part_r_ file. hostA_VarX_2010-05-01_mor