Hello, I am using Pig version 0.17.0. When I attempt to run my pig script from the command line on a Yarn cluster I get out of memory errors. From the Yarn application logs, I see this stack trace:
2018-04-27 13:22:10,543 ERROR [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOfRange(Arrays.java:3664) at java.lang.String.<init>(String.java:207) at java.lang.StringBuilder.toString(StringBuilder.java:407) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2992) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2817) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2689) at org.apache.hadoop.conf.Configuration.set(Configuration.java:1326) at org.apache.hadoop.conf.Configuration.set(Configuration.java:1298) at org.apache.pig.backend.hadoop.datastorage.ConfigurationUtil.mergeConf(ConfigurationUtil.java:70) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.setLocation(PigOutputFormat.java:185) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.setUpContext(PigOutputCommitter.java:115) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.getCommitters(PigOutputCommitter.java:89) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.<init>(PigOutputCommitter.java:70) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getOutputCommitter(PigOutputFormat.java:297) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$3.call(MRAppMaster.java:550) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$3.call(MRAppMaster.java:532) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.callWithJobClassLoader(MRAppMaster.java:1779) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:532) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:309) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$6.run(MRAppMaster.java:1737) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1734) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1668) Now, in trying to increase the heap size, I added this to the beginning of the script: SET mapreduce.map.java.opts '-Xmx2048m'; SET mapreduce.reduce.java.opts '-Xmx2048m'; SET mapreduce.map.memory.mb 2536; SET mapreduce.reduce.memory.mb 2536; But this causes no effect, as it is being ignored. From the Yarn logs, I see the Container being launched with 1024m heap size: echo "Launching container" exec /bin/bash -c "$JAVA_HOME/bin/java -Djava.io.tmpdir=$PWD/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/opt/hadoop/lo gs/userlogs/application_1523452171521_0223/container_1523452171521_0223_01_000001 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop. root.logfile=syslog -Xmx1024m org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1>/opt/hadoop/logs/userlogs/application_1523452171521_0223/container_1523452171 521_0223_01_000001/stdout 2>/opt/hadoop/logs/userlogs/application_1523452171521_0223/container_1523452171521_0223_01_000001/stderr “ I also tried setting the memory requirements with the PIG_OPTS environment variable: export PIG_OPTS="-Dmapreduce.reduce.memory.mb=5000 -Dmapreduce.map.memory.mb=5000 -Dmapreduce.map.java.opts=-Xmx5000m” No matter what I do, the container is always launched with -Xmx1024m and the same OOM error occurs. The question is, what is the proper way to specify the heap sizes for my Pig mappers and reducers? Best regards, Alex soto