Re: Child JVM memory allocation / Usage

2013-03-26 Thread Hemanth Yamijala
If your task is running out of memory, you could add the option * -XX:+HeapDumpOnOutOfMemoryError * *to *mapred.child.java.opts (along with the heap memory). However, I am not sure where it stores the dump.. You might need to experiment a little on it.. Will try and send out the info if I get

Re: Is the input values for reduce method sorted in any order?

2013-03-26 Thread Harsh J
MR will partition and sort inputs by keys by the key comparator, and then group them together when reading back via a grouping comparator (which is usually the same as the key comparator). It will not re-sort the values nor look at any of the value's fields during this process. If you want your

Re: Yarn Writing ApplicationMaster, Client Exception URISyntaxException: Expected scheme name at index 0

2013-03-26 Thread blah blah
Problem solved. Thank you. 2013/3/26 Harsh J ha...@cloudera.com YARN does not seem to be checking for a fully qualified path when you pass it yours and ends up breaking. The problem is easily reproducible with the two transforming calls from ConverterUtils. Transform the jarPath to a fully

How to tell my Hadoop cluster to read data from an external server

2013-03-26 Thread Agarwal, Nikhil
Hi, I have a Hadoop cluster up and running. I want to submit an MR job to it but the input data is kept on an external server (outside the hadoop cluster). Can anyone please suggest how do I tell my hadoop cluster to load the input data from the external servers and then do a MR on it ?

Re: How to tell my Hadoop cluster to read data from an external server

2013-03-26 Thread Nitin Pawar
you are looking at a two step workflow here first unit of your workflow will download the file from external server and write it to DFS and return the file path second unit of your workflow will read the input path and process the data according to your business logic in MR you can look at

Re: Is the input values for reduce method sorted in any order?

2013-03-26 Thread jingguo yao
Harsh, thanks. On Tue, Mar 26, 2013 at 2:28 PM, Harsh J ha...@cloudera.com wrote: MR will partition and sort inputs by keys by the key comparator, and then group them together when reading back via a grouping comparator (which is usually the same as the key comparator). It will not re-sort

Differences hadoop-2.0.0-alpha Vs hadoop-2.0.3-alpha

2013-03-26 Thread Krishna Kishore Bonagiri
Hi, I have some YARN application written and running properly against hadoop-2.0.0-alpha but when I recently downloaded and started using hadoop-2.0.3-alpha it doesn't work. I think the original one I wrote was looking at the Client.java and ApplicationMaster.java in DistributedShell example.

RE: How to tell my Hadoop cluster to read data from an external server

2013-03-26 Thread Agarwal, Nikhil
Hi, Thanks for your reply. I do not know about cascading. Should I google it as cascading in hadoop? Also, what I was thinking is to implement a file system which overrides the functions provided by fs.FileSystem interface in Hadoop. I tried to write some portions of the filesystem (for my

Re: Rack Awareness

2013-03-26 Thread Chris Embree
Make sure you have the topology script available on the JobTracker server as well. This also requires a jobtracker stop/start to take effect. Also, make sure $HADOOP_CONF resolves properly as the mapred user. On Tue, Mar 26, 2013 at 1:19 AM, preethi ganeshan preethiganesha...@gmail.com wrote:

RE: How to tell my Hadoop cluster to read data from an external server

2013-03-26 Thread Azuryy Yu
can you addInputPath(hdfs://……),dont change fs.default.name, It cannot solve your problem. On Mar 26, 2013 7:03 PM, Agarwal, Nikhil nikhil.agar...@netapp.com wrote: Hi, Thanks for your reply. I do not know about cascading. Should I google it as “cascading in hadoop”? Also, what I was

unsubscribe

2013-03-26 Thread Levin ding
unsubscribe

Re: Regarding NameNode Problem

2013-03-26 Thread Azuryy Yu
and your hadoop version. On Mar 26, 2013 1:28 PM, Mohammad Tariq donta...@gmail.com wrote: Hello Sagar, It would be helpful if you could share your logs with us. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Tue, Mar 26, 2013 at 10:47 AM, Sagar

Re: Child JVM memory allocation / Usage

2013-03-26 Thread Hemanth Yamijala
Hi, I tried to use the -XX:+HeapDumpOnOutOfMemoryError. Unfortunately, like I suspected, the dump goes to the current work directory of the task attempt as it executes on the cluster. This directory is cleaned up once the task is done. There are options to keep failed task files or task files

Re: Child JVM memory allocation / Usage

2013-03-26 Thread Koji Noguchi
Create a dump.sh on hdfs. $ hadoop dfs -cat /user/knoguchi/dump.sh #!/bin/sh hadoop dfs -put myheapdump.hprof /tmp/myheapdump_knoguchi/${PWD//\//_}.hprof Run your job with -Dmapred.create.symlink=yes -Dmapred.cache.files=hdfs:///user/knoguchi/dump.sh#dump.sh

Pydoop: hotfix and Homebrew formula for Mac OS X

2013-03-26 Thread Simone Leo
Hello Mac users! We are happy to announce that Pydoop (http://pydoop.sourceforge.net) has been included in the Homebrew Python Tap. You should now be able to install on Mac OS X Mountain Lion as follows: 1. Manually install the Oracle JDK 2. Set JAVA_HOME according to your JDK installation,

Re: Auto clean DistCache?

2013-03-26 Thread Abdelrhman Shettia
Hi JM , Actually these dirs need to be purged by a script that keeps the last 2 days worth of files, Otherwise you may run into # of open files exceeds error. Thanks On Mar 25, 2013, at 5:16 PM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi, Each time my MR job is run, a

Re: Auto clean DistCache?

2013-03-26 Thread Vinod Kumar Vavilapalli
You can control the limit of these cache files, the default is 10GB (value of 10737418240L): Try changing local.cache.size or mapreduce.tasktracker.cache.local.size in mapred-site.xml Thanks, +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ On Mar 25, 2013, at 5:16 PM,

Re: Auto clean DistCache?

2013-03-26 Thread Vinod Kumar Vavilapalli
All the files are not opened at the same time ever, so you shouldn't see any # of open files exceeds error. Thanks, +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ On Mar 26, 2013, at 12:53 PM, Abdelrhman Shettia wrote: Hi JM , Actually these dirs need to be purged by a

Re: For a new installation: use the BackupNode or the CheckPointNode?

2013-03-26 Thread Konstantin Shvachko
There is no BackupNode in Hadoop 1. That was a bug in documentation. Here is the updated link: http://hadoop.apache.org/docs/r1.1.2/hdfs_user_guide.html Thanks, --Konstantin On Sat, Mar 23, 2013 at 12:04 AM, varun kumar varun@gmail.com wrote: Hope below link will be useful..

Re: Auto clean DistCache?

2013-03-26 Thread Abdelrahman Shettia
Let me clarify , If there are lots of files or directories up to 32K ( Depending on the user's # of files sys os config) in those distributed cache dirs, The OS will not be able to create any more files/dirs, Thus M-R jobs wont get initiated on those tasktracker machines. Hope this helps. Thanks

Re: Auto clean DistCache?

2013-03-26 Thread Jean-Marc Spaggiari
For the situation I faced I was really a disk space issue, not related to the number of files. It was writing on a small partition. I will try with local.cache.size or mapreduce.tasktracker.cache.local.size to see if I can keep the final total size under 5GB... Else, I will go for a customed

RE: For a new installation: use the BackupNode or the CheckPointNode?

2013-03-26 Thread David Parks
Thanks for the update, I understand now that I'll be installing a secondary name node which performs checkpoints on the primary name node and keeps a working backup copy of the fsimage file. The primary name node should write its fsimage file to at least 2 different physical mediums for improved

Re: Child JVM memory allocation / Usage

2013-03-26 Thread Hemanth Yamijala
Koji, Works beautifully. Thanks a lot. I learnt at least 3 different things with your script today ! Hemanth On Tue, Mar 26, 2013 at 9:41 PM, Koji Noguchi knogu...@yahoo-inc.comwrote: Create a dump.sh on hdfs. $ hadoop dfs -cat /user/knoguchi/dump.sh #!/bin/sh hadoop dfs -put

RE: For a new installation: use the BackupNode or the CheckPointNode?

2013-03-26 Thread Azuryy Yu
yes,you got it. hadoop1.0.x cannot failover auto or mannual. you have to copy fsimage from SNN to the primary NN. On Mar 27, 2013 11:29 AM, David Parks davidpark...@yahoo.com wrote: Thanks for the update, I understand now that I'll be installing a secondary name node which performs checkpoints

Re: How to tell my Hadoop cluster to read data from an external server

2013-03-26 Thread Hemanth Yamijala
The stack trace indicates the job client is trying to submit a job to the MR cluster and it is failing. Are you certain that at the time of submitting the job, the JobTracker is running ? (On localhost:54312) ? Regarding using a different file system - it depends a lot on what file system you are

Re: For a new installation: use the BackupNode or the CheckPointNode?

2013-03-26 Thread Harsh J
David, If the good copy on NFS exists post-crash of the NN, use that for lesser/zero loss than the SNN which can be an hour old (checkpoint period) by default. Thats the whole point for running the NFS disk mount (make sure its softmounted btw, you don't want your NN to hang if the NFS is hung).