Hi, >> "Dumping heap to ./heapdump.hprof"
>> File myheapdump.hprof does not exist. The file names don't match - can you check your script / command line args. Thanks hemanth On Wed, Mar 27, 2013 at 3:21 PM, nagarjuna kanamarlapudi < nagarjuna.kanamarlap...@gmail.com> wrote: > Hi Hemanth, > > Nice to see this. I didnot know about this till now. > > But few one more issue.. the dump file did not get created.. The > following are the logs > > > > ttempt_201302211510_81218_m_000000_0: > /data/1/mapred/local/taskTracker/distcache/8776089957260881514_-363500746_715125253/cmp111wcd/user/ims-b/nagarjuna/AddressId_Extractor/Numbers > attempt_201302211510_81218_m_000000_0: java.lang.OutOfMemoryError: Java > heap space > attempt_201302211510_81218_m_000000_0: Dumping heap to ./heapdump.hprof ... > attempt_201302211510_81218_m_000000_0: Heap dump file created [210641441 > bytes in 3.778 secs] > attempt_201302211510_81218_m_000000_0: # > attempt_201302211510_81218_m_000000_0: # java.lang.OutOfMemoryError: Java > heap space > attempt_201302211510_81218_m_000000_0: # -XX:OnOutOfMemoryError="./dump.sh" > attempt_201302211510_81218_m_000000_0: # Executing /bin/sh -c > "./dump.sh"... > attempt_201302211510_81218_m_000000_0: put: File myheapdump.hprof does not > exist. > attempt_201302211510_81218_m_000000_0: log4j:WARN No appenders could be > found for logger (org.apache.hadoop.hdfs.DFSClient). > > > > > > On Wed, Mar 27, 2013 at 2:29 PM, Hemanth Yamijala < > yhema...@thoughtworks.com> wrote: > >> Couple of things to check: >> >> Does your class com.hadoop.publicationMrPOC.Launcher implement the Tool >> interface ? You can look at an example at ( >> http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#Source+Code-N110D0). >> That's what accepts the -D params on command line. Alternatively, you can >> also set the same in the configuration object like this, in your launcher >> code: >> >> Configuration conf = new Configuration() >> >> conf.set("mapred.create.symlink", "yes"); >> >> >> conf.set("mapred.cache.files", >> "hdfs:///user/hemanty/scripts/copy_dump.sh#copy_dump.sh"); >> >> >> conf.set("mapred.child.java.opts", >> >> >> "-Xmx200m -XX:+HeapDumpOnOutOfMemoryError >> -XX:HeapDumpPath=./heapdump.hprof -XX:OnOutOfMemoryError=./copy_dump.sh"); >> >> >> Second, the position of the arguments matters. I think the command should >> be >> >> hadoop jar -Dmapred.create.symlink=yes >> -Dmapred.cache.files=hdfs:///user/ims-b/dump.sh#dump.sh >> -Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError >> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh' >> com.hadoop.publicationMrPOC.Launcher Fudan\ Univ >> >> Thanks >> Hemanth >> >> >> On Wed, Mar 27, 2013 at 1:58 PM, nagarjuna kanamarlapudi < >> nagarjuna.kanamarlap...@gmail.com> wrote: >> >>> Hi Hemanth/Koji, >>> >>> Seems the above script doesn't work for me. Can u look into the >>> following and suggest what more can I do >>> >>> >>> hadoop fs -cat /user/ims-b/dump.sh >>> #!/bin/sh >>> hadoop dfs -put myheapdump.hprof /tmp/myheapdump_ims/${PWD//\//_}.hprof >>> >>> >>> hadoop jar LL.jar com.hadoop.publicationMrPOC.Launcher Fudan\ Univ >>> -Dmapred.create.symlink=yes >>> -Dmapred.cache.files=hdfs:///user/ims-b/dump.sh#dump.sh >>> -Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError >>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh' >>> >>> >>> I am not able to see the heap dump at /tmp/myheapdump_ims >>> >>> >>> >>> Erorr in the mapper : >>> >>> Caused by: java.lang.reflect.InvocationTargetException >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>> at java.lang.reflect.Method.invoke(Method.java:597) >>> at >>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) >>> ... 17 more >>> Caused by: java.lang.OutOfMemoryError: Java heap space >>> at java.util.Arrays.copyOf(Arrays.java:2734) >>> at java.util.ArrayList.ensureCapacity(ArrayList.java:167) >>> at java.util.ArrayList.add(ArrayList.java:351) >>> at >>> com.hadoop.publicationMrPOC.PublicationMapper.configure(PublicationMapper.java:59) >>> ... 22 more >>> >>> >>> >>> >>> >>> On Wed, Mar 27, 2013 at 10:16 AM, Hemanth Yamijala < >>> yhema...@thoughtworks.com> wrote: >>> >>>> Koji, >>>> >>>> Works beautifully. Thanks a lot. I learnt at least 3 different things >>>> with your script today ! >>>> >>>> Hemanth >>>> >>>> >>>> On Tue, Mar 26, 2013 at 9:41 PM, Koji Noguchi >>>> <knogu...@yahoo-inc.com>wrote: >>>> >>>>> Create a dump.sh on hdfs. >>>>> >>>>> $ hadoop dfs -cat /user/knoguchi/dump.sh >>>>> #!/bin/sh >>>>> hadoop dfs -put myheapdump.hprof >>>>> /tmp/myheapdump_knoguchi/${PWD//\//_}.hprof >>>>> >>>>> Run your job with >>>>> >>>>> -Dmapred.create.symlink=yes >>>>> -Dmapred.cache.files=hdfs:///user/knoguchi/dump.sh#dump.sh >>>>> -Dmapred.reduce.child.java.opts='-Xmx2048m >>>>> -XX:+HeapDumpOnOutOfMemoryError >>>>> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh' >>>>> >>>>> This should create the heap dump on hdfs at /tmp/myheapdump_knoguchi. >>>>> >>>>> Koji >>>>> >>>>> >>>>> On Mar 26, 2013, at 11:53 AM, Hemanth Yamijala wrote: >>>>> >>>>> > Hi, >>>>> > >>>>> > I tried to use the -XX:+HeapDumpOnOutOfMemoryError. Unfortunately, >>>>> like I suspected, the dump goes to the current work directory of the task >>>>> attempt as it executes on the cluster. This directory is cleaned up once >>>>> the task is done. There are options to keep failed task files or task >>>>> files >>>>> matching a pattern. However, these are NOT retaining the current working >>>>> directory. Hence, there is no option to get this from a cluster AFAIK. >>>>> > >>>>> > You are effectively left with the jmap option on pseudo distributed >>>>> cluster I think. >>>>> > >>>>> > Thanks >>>>> > Hemanth >>>>> > >>>>> > >>>>> > On Tue, Mar 26, 2013 at 11:37 AM, Hemanth Yamijala < >>>>> yhema...@thoughtworks.com> wrote: >>>>> > If your task is running out of memory, you could add the option >>>>> -XX:+HeapDumpOnOutOfMemoryError >>>>> > to mapred.child.java.opts (along with the heap memory). However, I >>>>> am not sure where it stores the dump.. You might need to experiment a >>>>> little on it.. Will try and send out the info if I get time to try out. >>>>> > >>>>> > >>>>> > Thanks >>>>> > Hemanth >>>>> > >>>>> > >>>>> > On Tue, Mar 26, 2013 at 10:23 AM, nagarjuna kanamarlapudi < >>>>> nagarjuna.kanamarlap...@gmail.com> wrote: >>>>> > Hi hemanth, >>>>> > >>>>> > This sounds interesting, will out try out that on the pseudo >>>>> cluster. But the real problem for me is, the cluster is being maintained >>>>> by third party. I only have have a edge node through which I can submit >>>>> the >>>>> jobs. >>>>> > >>>>> > Is there any other way of getting the dump instead of physically >>>>> going to that machine and checking out. >>>>> > >>>>> > >>>>> > >>>>> > On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala < >>>>> yhema...@thoughtworks.com> wrote: >>>>> > Hi, >>>>> > >>>>> > One option to find what could be taking the memory is to use jmap on >>>>> the running task. The steps I followed are: >>>>> > >>>>> > - I ran a sleep job (which comes in the examples jar of the >>>>> distribution - effectively does nothing in the mapper / reducer). >>>>> > - From the JobTracker UI looked at a map task attempt ID. >>>>> > - Then on the machine where the map task is running, got the PID of >>>>> the running task - ps -ef | grep <task attempt id> >>>>> > - On the same machine executed jmap -histo <pid> >>>>> > >>>>> > This will give you an idea of the count of objects allocated and >>>>> size. Jmap also has options to get a dump, that will contain more >>>>> information, but this should help to get you started with debugging. >>>>> > >>>>> > For my sleep job task - I saw allocations worth roughly 130 MB. >>>>> > >>>>> > Thanks >>>>> > hemanth >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi < >>>>> nagarjuna.kanamarlap...@gmail.com> wrote: >>>>> > I have a lookup file which I need in the mapper. So I am trying to >>>>> read the whole file and load it into list in the mapper. >>>>> > >>>>> > >>>>> > For each and every record Iook in this file which I got from >>>>> distributed cache. >>>>> > >>>>> > — >>>>> > Sent from iPhone >>>>> > >>>>> > >>>>> > On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala < >>>>> yhema...@thoughtworks.com> wrote: >>>>> > >>>>> > Hmm. How are you loading the file into memory ? Is it some sort of >>>>> memory mapping etc ? Are they being read as records ? Some details of the >>>>> app will help >>>>> > >>>>> > >>>>> > On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi < >>>>> nagarjuna.kanamarlap...@gmail.com> wrote: >>>>> > Hi Hemanth, >>>>> > >>>>> > I tried out your suggestion loading 420 MB file into memory. It >>>>> threw java heap space error. >>>>> > >>>>> > I am not sure where this 1.6 GB of configured heap went to ? >>>>> > >>>>> > >>>>> > On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala < >>>>> yhema...@thoughtworks.com> wrote: >>>>> > Hi, >>>>> > >>>>> > The free memory might be low, just because GC hasn't reclaimed what >>>>> it can. Can you just try reading in the data you want to read and see if >>>>> that works ? >>>>> > >>>>> > Thanks >>>>> > Hemanth >>>>> > >>>>> > >>>>> > On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi < >>>>> nagarjuna.kanamarlap...@gmail.com> wrote: >>>>> > io.sort.mb = 256 MB >>>>> > >>>>> > >>>>> > On Monday, March 25, 2013, Harsh J wrote: >>>>> > The MapTask may consume some memory of its own as well. What is your >>>>> > io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to? >>>>> > >>>>> > On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi >>>>> > <nagarjuna.kanamarlap...@gmail.com> wrote: >>>>> > > Hi, >>>>> > > >>>>> > > I configured my child jvm heap to 2 GB. So, I thought I could >>>>> really read >>>>> > > 1.5GB of data and store it in memory (mapper/reducer). >>>>> > > >>>>> > > I wanted to confirm the same and wrote the following piece of code >>>>> in the >>>>> > > configure method of mapper. >>>>> > > >>>>> > > @Override >>>>> > > >>>>> > > public void configure(JobConf job) { >>>>> > > >>>>> > > System.out.println("FREE MEMORY -- " >>>>> > > >>>>> > > + Runtime.getRuntime().freeMemory()); >>>>> > > >>>>> > > System.out.println("MAX MEMORY ---" + >>>>> Runtime.getRuntime().maxMemory()); >>>>> > > >>>>> > > } >>>>> > > >>>>> > > >>>>> > > Surprisingly the output was >>>>> > > >>>>> > > >>>>> > > FREE MEMORY -- 341854864 = 320 MB >>>>> > > MAX MEMORY ---1908932608 = 1.9 GB >>>>> > > >>>>> > > >>>>> > > I am just wondering what processes are taking up that extra 1.6GB >>>>> of heap >>>>> > > which I configured for the child jvm heap. >>>>> > > >>>>> > > >>>>> > > Appreciate in helping me understand the scenario. >>>>> > > >>>>> > > >>>>> > > >>>>> > > Regards >>>>> > > >>>>> > > Nagarjuna K >>>>> > > >>>>> > > >>>>> > > >>>>> > >>>>> > >>>>> > >>>>> > -- >>>>> > Harsh J >>>>> > >>>>> > >>>>> > -- >>>>> > Sent from iPhone >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> >>>>> >>>> >>> >> >