Hello,

I am using Pig version 0.17.0.  When I attempt to run my pig script from the 
command line on a Yarn cluster I get out of memory errors.  From the Yarn 
application logs, I see this stack trace:

2018-04-27 13:22:10,543 ERROR [main] 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOfRange(Arrays.java:3664)
        at java.lang.String.<init>(String.java:207)
        at java.lang.StringBuilder.toString(StringBuilder.java:407)
        at 
org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2992)
        at 
org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2817)
        at 
org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2689)
        at org.apache.hadoop.conf.Configuration.set(Configuration.java:1326)
        at org.apache.hadoop.conf.Configuration.set(Configuration.java:1298)
        at 
org.apache.pig.backend.hadoop.datastorage.ConfigurationUtil.mergeConf(ConfigurationUtil.java:70)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.setLocation(PigOutputFormat.java:185)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.setUpContext(PigOutputCommitter.java:115)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.getCommitters(PigOutputCommitter.java:89)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.<init>(PigOutputCommitter.java:70)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getOutputCommitter(PigOutputFormat.java:297)
        at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$3.call(MRAppMaster.java:550)
        at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$3.call(MRAppMaster.java:532)
        at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.callWithJobClassLoader(MRAppMaster.java:1779)
        at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:532)
        at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:309)
        at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
        at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$6.run(MRAppMaster.java:1737)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
        at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1734)
        at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1668)


Now, in trying to increase the heap size, I added this to the beginning of the 
script:


SET mapreduce.map.java.opts '-Xmx2048m';
SET mapreduce.reduce.java.opts '-Xmx2048m';
SET mapreduce.map.memory.mb 2536;
SET mapreduce.reduce.memory.mb 2536;

But this causes no effect, as it is being ignored.  From the Yarn logs, I see 
the Container being launched with 1024m heap size:

echo "Launching container"
exec /bin/bash -c "$JAVA_HOME/bin/java -Djava.io.tmpdir=$PWD/tmp 
-Dlog4j.configuration=container-log4j.properties 
-Dyarn.app.container.log.dir=/opt/hadoop/lo
gs/userlogs/application_1523452171521_0223/container_1523452171521_0223_01_000001
 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.
root.logfile=syslog  -Xmx1024m org.apache.hadoop.mapreduce.v2.app.MRAppMaster 
1>/opt/hadoop/logs/userlogs/application_1523452171521_0223/container_1523452171
521_0223_01_000001/stdout 
2>/opt/hadoop/logs/userlogs/application_1523452171521_0223/container_1523452171521_0223_01_000001/stderr
 “

I also tried setting the memory requirements with the PIG_OPTS environment 
variable:

export PIG_OPTS="-Dmapreduce.reduce.memory.mb=5000 
-Dmapreduce.map.memory.mb=5000 -Dmapreduce.map.java.opts=-Xmx5000m” 

No matter what I do, the container is always launched with -Xmx1024m and the 
same OOM error occurs.
The question is, what is the proper way to specify the heap sizes for my Pig 
mappers and reducers?

Best regards,
Alex soto


Reply via email to