Dear Haddop community,

I am trying to mature my Hadoop knowledge. In this case I am trying to make my 
spark submit job to fail due to OOM but I am not able to find the root cause in 
the logs.

This is the script I am running:

a = "bigword"
b = "bigword"
print(a)

for i in range(1000000000):
    a += b

with spark.driver.memory 3g

the job fails as expected but I can't find the real reason as I found the logs 
not clear enough

Attempt 1:
AM Container for appattempt_1570749574365_0050_000001 exited with exitCode: 11
Failing this attempt.Diagnostics: [2019-10-22 12:19:06.273]Exception from 
container-launch.
Container id: container_e15_1570749574365_0050_01_000001
Exit code: 11
Exception message: Launch container failed
Shell output: main : command provided 1
main : run as user is mansop
main : requested yarn user is mansop
Getting exit code file...
Creating script paths...
Writing pid file...
Writing to tmp file 
/d1/hadoop/yarn/local/nmPrivate/application_1570749574365_0050/container_e15_1570749574365_0050_01_000001/container_e15_1570749574365_0050_01_000001.pid.tmp
Writing to cgroup task files...
Creating local dirs...
Launching container...
Getting exit code file...
Creating script paths...
[2019-10-22 12:19:06.277]Container exited with a non-zero exit code 11. Error 
file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/d1/hadoop/yarn/local/filecache/13/spark2-hdp-yarn-archive.tar.gz/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
[2019-10-22 12:19:06.278]Container exited with a non-zero exit code 11. Error 
file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/d1/hadoop/yarn/local/filecache/13/spark2-hdp-yarn-archive.tar.gz/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
For more detailed output, check the application tracking page: 
http://gl-hdp-ctrl03-mlx.mlx:8088/cluster/app/application_1570749574365_0050 
Then click on links to logs of each attempt.

Attempt 2:
AM Container for appattempt_1570749574365_0050_000002 exited with exitCode: 13
Failing this attempt.Diagnostics: [2019-10-22 12:20:50.591]Exception from 
container-launch.
Container id: container_e15_1570749574365_0050_02_000001
Exit code: 13
Exception message: Launch container failed
Shell output: main : command provided 1
main : run as user is mansop
main : requested yarn user is mansop
Getting exit code file...
Creating script paths...
Writing pid file...
Writing to tmp file 
/d0/hadoop/yarn/local/nmPrivate/application_1570749574365_0050/container_e15_1570749574365_0050_02_000001/container_e15_1570749574365_0050_02_000001.pid.tmp
Writing to cgroup task files...
Creating local dirs...
Launching container...
Getting exit code file...
Creating script paths...
[2019-10-22 12:20:50.596]Container exited with a non-zero exit code 13. Error 
file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/d0/hadoop/yarn/local/filecache/10/spark2-hdp-yarn-archive.tar.gz/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
[2019-10-22 12:20:50.598]Container exited with a non-zero exit code 13. Error 
file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/d0/hadoop/yarn/local/filecache/10/spark2-hdp-yarn-archive.tar.gz/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
For more detailed output, check the application tracking page: 
http://gl-hdp-ctrl03-mlx.mlx:8088/cluster/app/application_1570749574365_0050 
Then click on links to logs of each attempt.

Could someone please help me to understand:

What exitCode: 13 and exitCode: 11 means?

How should I keep troubleshooting

Thank you very much

Manuel Sopena Ballesteros

Big Data Engineer | Kinghorn Centre for Clinical Genomics

 [cid:image001.png@01D4C835.ED3C2230] <https://www.garvan.org.au/>

a: 384 Victoria Street, Darlinghurst NSW 2010
p: +61 2 9355 5760  |  +61 4 12 123 123
e: manuel...@garvan.org.au<mailto:manuel...@garvan.org.au>

Like us on Facebook<http://www.facebook.com/garvaninstitute> | Follow us on 
Twitter<http://twitter.com/GarvanInstitute> and 
LinkedIn<http://www.linkedin.com/company/garvan-institute-of-medical-research>

NOTICE
Please consider the environment before printing this email. This message and 
any attachments are intended for the addressee named and may contain legally 
privileged/confidential/copyright information. If you are not the intended 
recipient, you should not read, use, disclose, copy or distribute this 
communication. If you have received this message in error please notify us at 
once by return email and then delete both messages. We accept no liability for 
the distribution of viruses or similar in electronic communications. This 
notice should not be removed.

Reply via email to