Dear Haddop community, I am trying to mature my Hadoop knowledge. In this case I am trying to make my spark submit job to fail due to OOM but I am not able to find the root cause in the logs.
This is the script I am running: a = "bigword" b = "bigword" print(a) for i in range(1000000000): a += b with spark.driver.memory 3g the job fails as expected but I can't find the real reason as I found the logs not clear enough Attempt 1: AM Container for appattempt_1570749574365_0050_000001 exited with exitCode: 11 Failing this attempt.Diagnostics: [2019-10-22 12:19:06.273]Exception from container-launch. Container id: container_e15_1570749574365_0050_01_000001 Exit code: 11 Exception message: Launch container failed Shell output: main : command provided 1 main : run as user is mansop main : requested yarn user is mansop Getting exit code file... Creating script paths... Writing pid file... Writing to tmp file /d1/hadoop/yarn/local/nmPrivate/application_1570749574365_0050/container_e15_1570749574365_0050_01_000001/container_e15_1570749574365_0050_01_000001.pid.tmp Writing to cgroup task files... Creating local dirs... Launching container... Getting exit code file... Creating script paths... [2019-10-22 12:19:06.277]Container exited with a non-zero exit code 11. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr : SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/d1/hadoop/yarn/local/filecache/13/spark2-hdp-yarn-archive.tar.gz/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] [2019-10-22 12:19:06.278]Container exited with a non-zero exit code 11. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr : SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/d1/hadoop/yarn/local/filecache/13/spark2-hdp-yarn-archive.tar.gz/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] For more detailed output, check the application tracking page: http://gl-hdp-ctrl03-mlx.mlx:8088/cluster/app/application_1570749574365_0050 Then click on links to logs of each attempt. Attempt 2: AM Container for appattempt_1570749574365_0050_000002 exited with exitCode: 13 Failing this attempt.Diagnostics: [2019-10-22 12:20:50.591]Exception from container-launch. Container id: container_e15_1570749574365_0050_02_000001 Exit code: 13 Exception message: Launch container failed Shell output: main : command provided 1 main : run as user is mansop main : requested yarn user is mansop Getting exit code file... Creating script paths... Writing pid file... Writing to tmp file /d0/hadoop/yarn/local/nmPrivate/application_1570749574365_0050/container_e15_1570749574365_0050_02_000001/container_e15_1570749574365_0050_02_000001.pid.tmp Writing to cgroup task files... Creating local dirs... Launching container... Getting exit code file... Creating script paths... [2019-10-22 12:20:50.596]Container exited with a non-zero exit code 13. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr : SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/d0/hadoop/yarn/local/filecache/10/spark2-hdp-yarn-archive.tar.gz/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] [2019-10-22 12:20:50.598]Container exited with a non-zero exit code 13. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr : SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/d0/hadoop/yarn/local/filecache/10/spark2-hdp-yarn-archive.tar.gz/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] For more detailed output, check the application tracking page: http://gl-hdp-ctrl03-mlx.mlx:8088/cluster/app/application_1570749574365_0050 Then click on links to logs of each attempt. Could someone please help me to understand: What exitCode: 13 and exitCode: 11 means? How should I keep troubleshooting Thank you very much Manuel Sopena Ballesteros Big Data Engineer | Kinghorn Centre for Clinical Genomics [cid:image001.png@01D4C835.ED3C2230] <https://www.garvan.org.au/> a: 384 Victoria Street, Darlinghurst NSW 2010 p: +61 2 9355 5760 | +61 4 12 123 123 e: manuel...@garvan.org.au<mailto:manuel...@garvan.org.au> Like us on Facebook<http://www.facebook.com/garvaninstitute> | Follow us on Twitter<http://twitter.com/GarvanInstitute> and LinkedIn<http://www.linkedin.com/company/garvan-institute-of-medical-research> NOTICE Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright information. If you are not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability for the distribution of viruses or similar in electronic communications. This notice should not be removed.