Couple of things to check: Does your class com.hadoop.publicationMrPOC.Launcher implement the Tool interface ? You can look at an example at ( http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#Source+Code-N110D0). That's what accepts the -D params on command line. Alternatively, you can also set the same in the configuration object like this, in your launcher code:
Configuration conf = new Configuration() conf.set("mapred.create.symlink", "yes"); conf.set("mapred.cache.files", "hdfs:///user/hemanty/scripts/copy_dump.sh#copy_dump.sh"); conf.set("mapred.child.java.opts", "-Xmx200m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./heapdump.hprof -XX:OnOutOfMemoryError=./copy_dump.sh"); Second, the position of the arguments matters. I think the command should be hadoop jar -Dmapred.create.symlink=yes -Dmapred.cache.files=hdfs:///user/ims-b/dump.sh#dump.sh -Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh' com.hadoop.publicationMrPOC.Launcher Fudan\ Univ Thanks Hemanth On Wed, Mar 27, 2013 at 1:58 PM, nagarjuna kanamarlapudi < nagarjuna.kanamarlap...@gmail.com> wrote: > Hi Hemanth/Koji, > > Seems the above script doesn't work for me. Can u look into the following > and suggest what more can I do > > > hadoop fs -cat /user/ims-b/dump.sh > #!/bin/sh > hadoop dfs -put myheapdump.hprof /tmp/myheapdump_ims/${PWD//\//_}.hprof > > > hadoop jar LL.jar com.hadoop.publicationMrPOC.Launcher Fudan\ Univ > -Dmapred.create.symlink=yes > -Dmapred.cache.files=hdfs:///user/ims-b/dump.sh#dump.sh > -Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError > -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh' > > > I am not able to see the heap dump at /tmp/myheapdump_ims > > > > Erorr in the mapper : > > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) > ... 17 more > Caused by: java.lang.OutOfMemoryError: Java heap space > at java.util.Arrays.copyOf(Arrays.java:2734) > at java.util.ArrayList.ensureCapacity(ArrayList.java:167) > at java.util.ArrayList.add(ArrayList.java:351) > at > com.hadoop.publicationMrPOC.PublicationMapper.configure(PublicationMapper.java:59) > ... 22 more > > > > > > On Wed, Mar 27, 2013 at 10:16 AM, Hemanth Yamijala < > yhema...@thoughtworks.com> wrote: > >> Koji, >> >> Works beautifully. Thanks a lot. I learnt at least 3 different things >> with your script today ! >> >> Hemanth >> >> >> On Tue, Mar 26, 2013 at 9:41 PM, Koji Noguchi <knogu...@yahoo-inc.com>wrote: >> >>> Create a dump.sh on hdfs. >>> >>> $ hadoop dfs -cat /user/knoguchi/dump.sh >>> #!/bin/sh >>> hadoop dfs -put myheapdump.hprof >>> /tmp/myheapdump_knoguchi/${PWD//\//_}.hprof >>> >>> Run your job with >>> >>> -Dmapred.create.symlink=yes >>> -Dmapred.cache.files=hdfs:///user/knoguchi/dump.sh#dump.sh >>> -Dmapred.reduce.child.java.opts='-Xmx2048m >>> -XX:+HeapDumpOnOutOfMemoryError >>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh' >>> >>> This should create the heap dump on hdfs at /tmp/myheapdump_knoguchi. >>> >>> Koji >>> >>> >>> On Mar 26, 2013, at 11:53 AM, Hemanth Yamijala wrote: >>> >>> > Hi, >>> > >>> > I tried to use the -XX:+HeapDumpOnOutOfMemoryError. Unfortunately, >>> like I suspected, the dump goes to the current work directory of the task >>> attempt as it executes on the cluster. This directory is cleaned up once >>> the task is done. There are options to keep failed task files or task files >>> matching a pattern. However, these are NOT retaining the current working >>> directory. Hence, there is no option to get this from a cluster AFAIK. >>> > >>> > You are effectively left with the jmap option on pseudo distributed >>> cluster I think. >>> > >>> > Thanks >>> > Hemanth >>> > >>> > >>> > On Tue, Mar 26, 2013 at 11:37 AM, Hemanth Yamijala < >>> yhema...@thoughtworks.com> wrote: >>> > If your task is running out of memory, you could add the option >>> -XX:+HeapDumpOnOutOfMemoryError >>> > to mapred.child.java.opts (along with the heap memory). However, I am >>> not sure where it stores the dump.. You might need to experiment a little >>> on it.. Will try and send out the info if I get time to try out. >>> > >>> > >>> > Thanks >>> > Hemanth >>> > >>> > >>> > On Tue, Mar 26, 2013 at 10:23 AM, nagarjuna kanamarlapudi < >>> nagarjuna.kanamarlap...@gmail.com> wrote: >>> > Hi hemanth, >>> > >>> > This sounds interesting, will out try out that on the pseudo cluster. >>> But the real problem for me is, the cluster is being maintained by third >>> party. I only have have a edge node through which I can submit the jobs. >>> > >>> > Is there any other way of getting the dump instead of physically going >>> to that machine and checking out. >>> > >>> > >>> > >>> > On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala < >>> yhema...@thoughtworks.com> wrote: >>> > Hi, >>> > >>> > One option to find what could be taking the memory is to use jmap on >>> the running task. The steps I followed are: >>> > >>> > - I ran a sleep job (which comes in the examples jar of the >>> distribution - effectively does nothing in the mapper / reducer). >>> > - From the JobTracker UI looked at a map task attempt ID. >>> > - Then on the machine where the map task is running, got the PID of >>> the running task - ps -ef | grep <task attempt id> >>> > - On the same machine executed jmap -histo <pid> >>> > >>> > This will give you an idea of the count of objects allocated and size. >>> Jmap also has options to get a dump, that will contain more information, >>> but this should help to get you started with debugging. >>> > >>> > For my sleep job task - I saw allocations worth roughly 130 MB. >>> > >>> > Thanks >>> > hemanth >>> > >>> > >>> > >>> > >>> > On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi < >>> nagarjuna.kanamarlap...@gmail.com> wrote: >>> > I have a lookup file which I need in the mapper. So I am trying to >>> read the whole file and load it into list in the mapper. >>> > >>> > >>> > For each and every record Iook in this file which I got from >>> distributed cache. >>> > >>> > — >>> > Sent from iPhone >>> > >>> > >>> > On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala < >>> yhema...@thoughtworks.com> wrote: >>> > >>> > Hmm. How are you loading the file into memory ? Is it some sort of >>> memory mapping etc ? Are they being read as records ? Some details of the >>> app will help >>> > >>> > >>> > On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi < >>> nagarjuna.kanamarlap...@gmail.com> wrote: >>> > Hi Hemanth, >>> > >>> > I tried out your suggestion loading 420 MB file into memory. It threw >>> java heap space error. >>> > >>> > I am not sure where this 1.6 GB of configured heap went to ? >>> > >>> > >>> > On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala < >>> yhema...@thoughtworks.com> wrote: >>> > Hi, >>> > >>> > The free memory might be low, just because GC hasn't reclaimed what it >>> can. Can you just try reading in the data you want to read and see if that >>> works ? >>> > >>> > Thanks >>> > Hemanth >>> > >>> > >>> > On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi < >>> nagarjuna.kanamarlap...@gmail.com> wrote: >>> > io.sort.mb = 256 MB >>> > >>> > >>> > On Monday, March 25, 2013, Harsh J wrote: >>> > The MapTask may consume some memory of its own as well. What is your >>> > io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to? >>> > >>> > On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi >>> > <nagarjuna.kanamarlap...@gmail.com> wrote: >>> > > Hi, >>> > > >>> > > I configured my child jvm heap to 2 GB. So, I thought I could >>> really read >>> > > 1.5GB of data and store it in memory (mapper/reducer). >>> > > >>> > > I wanted to confirm the same and wrote the following piece of code >>> in the >>> > > configure method of mapper. >>> > > >>> > > @Override >>> > > >>> > > public void configure(JobConf job) { >>> > > >>> > > System.out.println("FREE MEMORY -- " >>> > > >>> > > + Runtime.getRuntime().freeMemory()); >>> > > >>> > > System.out.println("MAX MEMORY ---" + >>> Runtime.getRuntime().maxMemory()); >>> > > >>> > > } >>> > > >>> > > >>> > > Surprisingly the output was >>> > > >>> > > >>> > > FREE MEMORY -- 341854864 = 320 MB >>> > > MAX MEMORY ---1908932608 = 1.9 GB >>> > > >>> > > >>> > > I am just wondering what processes are taking up that extra 1.6GB of >>> heap >>> > > which I configured for the child jvm heap. >>> > > >>> > > >>> > > Appreciate in helping me understand the scenario. >>> > > >>> > > >>> > > >>> > > Regards >>> > > >>> > > Nagarjuna K >>> > > >>> > > >>> > > >>> > >>> > >>> > >>> > -- >>> > Harsh J >>> > >>> > >>> > -- >>> > Sent from iPhone >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> >>> >> >