Hi,

thanks for reply. Machine is pretty small as it has 4GB of total memory. So
we reserved 1GB for OS, 1GB HBase (according to recommendation) so remains
2GB thats what nodemanager claims.

Actually it is a cluster of 5machines, 2 name-nodes and 3 data nodes. All
machines has similar parameters so the stronger ones are used for nn and
rest for dn. I know that hw is far away from ideal but it is a small
cluster for a POC and gaining some experiences.

Back to the problem. At the time when this happens no other job is running
on cluster. All mappers (3) has already finished and we have single reduce
task which fails at ~ 70% of its progress on virtual memory consumption.
Dataset which is processing is 500MB of avro data file compressed. Reducer
doesn't cache anything intentionally, just divide a records in various
folders dynamically.
>From RM console I clearly see that there is a free unused resources -
memory. Is there a way how to detect what consumed that assigned virtual
memory?  Because for a smaller amount of input data ~ 120MB compressed data
- job finishes just fine within 3 min.

We have obviously a problem in scaling the task out. Could someone provide
some hints as it seems that we are missing something fundamental here.

Thanks for helping me out
Jakub

On 11 September 2014 11:34, Susheel Kumar Gadalay <skgada...@gmail.com>
wrote:

> Your physical memory is 1GB on this node.
>
> What are the other containers (map tasks) running on this?
>
> You have given map memory as 768M and reduce memory as 1024M and am as
> 1024M.
>
> With AM and a single map task it is 1.7M and cannot start another
> container for reducer.
> Reduce these values and check.
>
> On 9/11/14, Jakub Stransky <stransky...@gmail.com> wrote:
> > Hello hadoop users,
> >
> > I am facing following issue when running M/R job during a reduce phase:
> >
> > Container [pid=22961,containerID=container_1409834588043_0080_01_000010]
> is
> > running beyond virtual memory limits. Current usage: 636.6 MB of 1 GB
> > physical memory used; 2.1 GB of 2.1 GB virtual memory used.
> > Killing container. Dump of the process-tree for
> > container_1409834588043_0080_01_000010 :
> > |- PID    PPID  PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
> > SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
> > |- 22961  16896 22961  22961  (bash)    0                      0
> >         9424896           312                 /bin/bash -c
> > /usr/java/default/bin/java -Djava.net.preferIPv4Stack=true
> > -Dhadoop.metrics.log.level=WARN -Xmx768m
> >
> -Djava.io.tmpdir=/home/hadoop/yarn/local/usercache/jobsubmit/appcache/application_1409834588043_0080/container_1409834588043_0080_01_000010/tmp
> > -Dlog4j.configuration=container-log4j.properties
> >
> -Dyarn.app.container.log.dir=/home/hadoop/yarn/logs/application_1409834588043_0080/container_1409834588043_0080_01_000010
> > -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA
> > org.apache.hadoop.mapred.YarnChild 153.87.47.116 47184
> > attempt_1409834588043_0080_r_000000_0 10
> >
> 1>/home/hadoop/yarn/logs/application_1409834588043_0080/container_1409834588043_0080_01_000010/stdout
> >
> 2>/home/hadoop/yarn/logs/application_1409834588043_0080/container_1409834588043_0080_01_000010/stderr
> > |- 22970 22961 22961 22961 (java) 24692 1165 2256662528 162659
> > /usr/java/default/bin/java -Djava.net.preferIPv4Stack=true
> > -Dhadoop.metrics.log.level=WARN -Xmx768m
> >
> -Djava.io.tmpdir=/home/hadoop/yarn/local/usercache/jobsubmit/appcache/application_1409834588043_0080/container_1409834588043_0080_01_000010/tmp
> > -Dlog4j.configuration=container-log4j.properties
> >
> -Dyarn.app.container.log.dir=/home/hadoop/yarn/logs/application_1409834588043_0080/container_1409834588043_0080_01_000010
> > -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA
> > org.apache.hadoop.mapred.YarnChild 153.87.47.116 47184
> > attempt_1409834588043_0080_r_000000_0 10 Container killed on request.
> Exit
> > code is 143
> >
> >
> > I have following settings with default ratio physical to vm set to 2.1 :
> > # hadoop - yarn-site.xml
> > yarn.nodemanager.resource.memory-mb  : 2048
> > yarn.scheduler.minimum-allocation-mb : 256
> > yarn.scheduler.maximum-allocation-mb : 2048
> >
> > # hadoop - mapred-site.xml
> > mapreduce.map.memory.mb              : 768
> > mapreduce.map.java.opts              : -Xmx512m
> > mapreduce.reduce.memory.mb           : 1024
> > mapreduce.reduce.java.opts           : -Xmx768m
> > mapreduce.task.io.sort.mb            : 100
> > yarn.app.mapreduce.am.resource.mb    : 1024
> > yarn.app.mapreduce.am.command-opts   : -Xmx768m
> >
> > I have following questions:
> > - Is it possible to track down the vm consumption? Find what was the
> cause
> > for such a high vm.
> > - What is the best way to solve this kind of problems?
> > - I found following recommendation on the internet: " We actually
> recommend
> > disabling this check by setting yarn.nodemanager.vmem-check-enabled to
> false
> > as
> > there is reason to believe the virtual/physical ratio is exceptionally
> high
> > with some versions of Java / Linux." Is it a good way to go?
> >
> > My reduce task doesn't perform any super activity - just classify data,
> for
> > a given input key chooses the appropriate output folder and writes the
> data
> > out.
> >
> > Thanks for any advice
> > Jakub
> >
>



-- 
Jakub Stransky
cz.linkedin.com/in/jakubstransky

Reply via email to