Re: How to access contents of a Map Reduce job's working directory

2011-08-01 Thread Harsh J
Smriti, By working directory, do you mean the task attempt's working directory or the global job staging directory? On Tue, Aug 2, 2011 at 6:22 AM, smriti singh wrote: > I want to run a MapReduce job in hadoop which needs to create a "setup" > folder in working directory. During the execution th

Re: MultipleOutputs support

2011-08-01 Thread Harsh J
Hello Vanja, The CDH report is best submitted to cdh-u...@cloudera.org where an action could then be taken. Would help if you can describe your new API MO issue as well! Regarding your general multiple outputs in output directory issue, check this FAQ to get a full understanding of how task commi

Re: MapReduce jobs hanging or failing near completion

2011-08-01 Thread Arun C Murthy
On Aug 1, 2011, at 9:47 PM, Kai Ju Liu wrote: > Hi Arun. Since migrating HDFS off EBS-mounted volumes and onto ephemeral > disks, the problem has actually persisted. Now, however, there is no evidence > of errors on any of the mappers. The job tracker lists one less map completed > than the map

Re: MapReduce jobs hanging or failing near completion

2011-08-01 Thread Kai Ju Liu
Hi Arun. Since migrating HDFS off EBS-mounted volumes and onto ephemeral disks, the problem has actually persisted. Now, however, there is no evidence of errors on any of the mappers. The job tracker lists one less map completed than the map total, while the job details show all mappers as having c

Re: High load, low CPU on hard-to-reach instances

2011-08-01 Thread Kai Ju Liu
Since migrating HDFS off of EBS-mounted drives and onto ephemeral drives, this issue has not resurfaced. If anyone else experiences these issues in the AWS stack, it's definitely worth considering migrating onto physical disks. Kai Ju On Wed, Jul 6, 2011 at 10:34 AM, Kai Ju Liu wrote: > I'll ha

Re: Configuring Hadoop daemon heap sizes

2011-08-01 Thread Kai Ju Liu
Hi. The new settings work perfectly for all five Hadoop daemons. Thanks so much for the example and the link! Kai Ju On Mon, Aug 1, 2011 at 11:16 AM, Harsh J wrote: > For specific max-heap sizes, you have to pass the value as a java vm > argument. See > http://avricot.com/blog/index.php?post/20

RE: fail to compile hadoop branch-0.22/mapreduce

2011-08-01 Thread Rottinghuis, Joep
This is an erroneous patch applied for: https://issues.apache.org/jira/browse/HDFS-2189 In the meantime that patch has been rolled back (and a new fixed one attached). However, due to no build servers being available, the new code has not been built and deployed to the Maven repo. The problem is

fail to compile hadoop branch-0.22/mapreduce

2011-08-01 Thread 周俊清
hello,everyone I checkout the branch-0.22 source code from apache, i can compile the common and hdfs codes successfully,but I got a exception when i compiling branch-0.22/mapreduce as follow.I don't known why this happen.Thank you for your concerning. ivy-resolve-common: [ivy:resolve] [ivy:reso

How to access contents of a Map Reduce job's working directory

2011-08-01 Thread smriti singh
I want to run a MapReduce job in hadoop which needs to create a "setup" folder in working directory. During the execution the job will generate some additional text files within this "setup" folder. The problem is I dont know how to access or move this setup folder content to my local file system a

How to access contents of a Map Reduce job's working directory

2011-08-01 Thread Shrish Bajpai
I have just started to explore Hadoop but I am stuck in a situation now. I want to run a MapReduce job in hadoop which needs to create a "setup" folder in working directory. During the execution the job will generate some additional text files within this "setup" folder. The problem is I dont know

MultipleOutputs support

2011-08-01 Thread Vanja Komadinovic
Hi all, I'm trying to create M/R tasks that will output more than one "type" of data. Ideal thing would be MultipleOutputs feature of Map Reduce, but in our current production version, CDH3 ( 0.20.2 ), this support is broken. So, I tried to simulate MultipleOutputs. In Reducer setup I'm openin

Unusual large number of map tasks for a SequenceFile

2011-08-01 Thread Adam Shook
Hi All, I am writing a sequence file to HDFS from an application as a pre-process to a MapReduce job. (It isn't being written from a MR job, just open, write, close) The file is around 32 MBs in size. When the MapReduce job starts up, it starts with 256 map tasks. I am writing SequenceFiles

Re: Configuring Hadoop daemon heap sizes

2011-08-01 Thread Harsh J
For specific max-heap sizes, you have to pass the value as a java vm argument. See http://avricot.com/blog/index.php?post/2010/05/03/Get-started-with-java-JVM-memory-(heap%2C-stack%2C-xss-xms-xmx-xmn...) for a good view on things with JVM and memory. An example for specific heap size options to J

Configuring Hadoop daemon heap sizes

2011-08-01 Thread Kai Ju Liu
Hi. I'm trying to tweak heap sizes for the Hadoop daemons, i.e. namenode/datanode and jobtracker/tasktracker. I've tried setting HADOOP_NAMENODE_HEAPSIZE, HADOOP_DATANODE_HEAPSIZE, and so on in hadoop-env.sh, but the heap size remains at the default of 1,000MB. In the cluster setup documentation,