Dear all,

We encountered a problem with failed Spark jobs.
We have a Spark/Hadoop cluster - CDH 5.1.2 + Spark 1.1

After launching a spark job with command:
~/soft/spark-1.1.0-bin-hadoop2.3/bin/spark-submit --master yarn-cluster
--executor-memory 4G --driver-memory 4G --class
"ru.retailrocket.spark.Upsell" --num-executors 18 --executor-cores 2
target/scala-2.10/Upsell-assembly-1.0.jar

if a task is failed some of old workers processes are still in memory:

yarn      9916  0.0  0.0  12228  1444 ?        Ss   15:05   0:00 /bin/bash
-c /usr/lib/jvm/java-7-oracle/bin/java -server -XX:OnOutOfMemoryError='kill
%p' -Xms4096m -Xmx4096m 
-Djava.io.tmpdir=/dfs/dn1/yarn/local/usercache/tik/appcache/application_1414589211432_63031/container_1414589211432_63031_01_000010/tmp
'-Dspark.akka.timeout=1000' '-Dspark.akka.frameSize=1000'
org.apache.spark.executor.CoarseGrainedExecutorBackend
akka.tcp://sparkdri...@h6.xxxxxxxxx.ru:53813/user/CoarseGrainedScheduler 16
h11.XXXXXXXXX.ru 2 1>
/dfs/dn2/yarn/logs/application_1414589211432_63031/container_1414589211432_63031_01_000010/stdout
2>
/dfs/dn2/yarn/logs/application_1414589211432_63031/container_1414589211432_63031_01_000010/stderr

Why Spark doesn't kill such processes?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-doesn-t-kill-worker-process-after-failing-on-Yarn-tp19378.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to