Koji,

Works beautifully. Thanks a lot. I learnt at least 3 different things with
your script today !

Hemanth


On Tue, Mar 26, 2013 at 9:41 PM, Koji Noguchi <knogu...@yahoo-inc.com>wrote:

> Create a dump.sh on hdfs.
>
> $ hadoop dfs -cat /user/knoguchi/dump.sh
> #!/bin/sh
> hadoop dfs -put myheapdump.hprof
> /tmp/myheapdump_knoguchi/${PWD//\//_}.hprof
>
> Run your job with
>
> -Dmapred.create.symlink=yes
> -Dmapred.cache.files=hdfs:///user/knoguchi/dump.sh#dump.sh
> -Dmapred.reduce.child.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError
> -XX:HeapDumpPath=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
>
> This should create the heap dump on hdfs at /tmp/myheapdump_knoguchi.
>
> Koji
>
>
> On Mar 26, 2013, at 11:53 AM, Hemanth Yamijala wrote:
>
> > Hi,
> >
> > I tried to use the -XX:+HeapDumpOnOutOfMemoryError. Unfortunately, like
> I suspected, the dump goes to the current work directory of the task
> attempt as it executes on the cluster. This directory is cleaned up once
> the task is done. There are options to keep failed task files or task files
> matching a pattern. However, these are NOT retaining the current working
> directory. Hence, there is no option to get this from a cluster AFAIK.
> >
> > You are effectively left with the jmap option on pseudo distributed
> cluster I think.
> >
> > Thanks
> > Hemanth
> >
> >
> > On Tue, Mar 26, 2013 at 11:37 AM, Hemanth Yamijala <
> yhema...@thoughtworks.com> wrote:
> > If your task is running out of memory, you could add the option
> -XX:+HeapDumpOnOutOfMemoryError
> > to mapred.child.java.opts (along with the heap memory). However, I am
> not sure  where it stores the dump.. You might need to experiment a little
> on it.. Will try and send out the info if I get time to try out.
> >
> >
> > Thanks
> > Hemanth
> >
> >
> > On Tue, Mar 26, 2013 at 10:23 AM, nagarjuna kanamarlapudi <
> nagarjuna.kanamarlap...@gmail.com> wrote:
> > Hi hemanth,
> >
> > This sounds interesting, will out try out that on the pseudo cluster.
>  But the real problem for me is, the cluster is being maintained by third
> party. I only have have a edge node through which I can submit the jobs.
> >
> > Is there any other way of getting the dump instead of physically going
> to that machine and  checking out.
> >
> >
> >
> > On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala <
> yhema...@thoughtworks.com> wrote:
> > Hi,
> >
> > One option to find what could be taking the memory is to use jmap on the
> running task. The steps I followed are:
> >
> > - I ran a sleep job (which comes in the examples jar of the distribution
> - effectively does nothing in the mapper / reducer).
> > - From the JobTracker UI looked at a map task attempt ID.
> > - Then on the machine where the map task is running, got the PID of the
> running task - ps -ef | grep <task attempt id>
> > - On the same machine executed jmap -histo <pid>
> >
> > This will give you an idea of the count of objects allocated and size.
> Jmap also has options to get a dump, that will contain more information,
> but this should help to get you started with debugging.
> >
> > For my sleep job task - I saw allocations worth roughly 130 MB.
> >
> > Thanks
> > hemanth
> >
> >
> >
> >
> > On Mon, Mar 25, 2013 at 6:43 PM, Nagarjuna Kanamarlapudi <
> nagarjuna.kanamarlap...@gmail.com> wrote:
> > I have a lookup file which I need in the mapper. So I am trying to read
> the whole file and load it into list in the mapper.
> >
> >
> > For each and every record Iook in this file which I got from distributed
> cache.
> >
> > —
> > Sent from iPhone
> >
> >
> > On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala <
> yhema...@thoughtworks.com> wrote:
> >
> > Hmm. How are you loading the file into memory ? Is it some sort of
> memory mapping etc ? Are they being read as records ? Some details of the
> app will help
> >
> >
> > On Mon, Mar 25, 2013 at 2:14 PM, nagarjuna kanamarlapudi <
> nagarjuna.kanamarlap...@gmail.com> wrote:
> > Hi Hemanth,
> >
> > I tried out your suggestion loading 420 MB file into memory. It threw
> java heap space error.
> >
> > I am not sure where this 1.6 GB of configured heap went to ?
> >
> >
> > On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala <
> yhema...@thoughtworks.com> wrote:
> > Hi,
> >
> > The free memory might be low, just because GC hasn't reclaimed what it
> can. Can you just try reading in the data you want to read and see if that
> works ?
> >
> > Thanks
> > Hemanth
> >
> >
> > On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi <
> nagarjuna.kanamarlap...@gmail.com> wrote:
> > io.sort.mb = 256 MB
> >
> >
> > On Monday, March 25, 2013, Harsh J wrote:
> > The MapTask may consume some memory of its own as well. What is your
> > io.sort.mb (MR1) or mapreduce.task.io.sort.mb (MR2) set to?
> >
> > On Sun, Mar 24, 2013 at 3:40 PM, nagarjuna kanamarlapudi
> > <nagarjuna.kanamarlap...@gmail.com> wrote:
> > > Hi,
> > >
> > > I configured  my child jvm heap to 2 GB. So, I thought I could really
> read
> > > 1.5GB of data and store it in memory (mapper/reducer).
> > >
> > > I wanted to confirm the same and wrote the following piece of code in
> the
> > > configure method of mapper.
> > >
> > > @Override
> > >
> > > public void configure(JobConf job) {
> > >
> > > System.out.println("FREE MEMORY -- "
> > >
> > > + Runtime.getRuntime().freeMemory());
> > >
> > > System.out.println("MAX MEMORY ---" +
> Runtime.getRuntime().maxMemory());
> > >
> > > }
> > >
> > >
> > > Surprisingly the output was
> > >
> > >
> > > FREE MEMORY -- 341854864  = 320 MB
> > > MAX MEMORY ---1908932608  = 1.9 GB
> > >
> > >
> > > I am just wondering what processes are taking up that extra 1.6GB of
> heap
> > > which I configured for the child jvm heap.
> > >
> > >
> > > Appreciate in helping me understand the scenario.
> > >
> > >
> > >
> > > Regards
> > >
> > > Nagarjuna K
> > >
> > >
> > >
> >
> >
> >
> > --
> > Harsh J
> >
> >
> > --
> > Sent from iPhone
> >
> >
> >
> >
> >
> >
> >
> >
>
>

Reply via email to