I have this problem with a job. A random executor gets this ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15: SIGTERM
Almost always at the same point in the processing of the data. I am processing 1 mil files with sc.wholeText. At around the 600.000th file, a container receives this signal. On the driver i get: 15/07/03 14:20:11 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. cruncher03.stratified:44617 15/07/03 14:20:11 ERROR cluster.YarnClusterScheduler: Lost executor 3 on cruncher03.stratified: remote Rpc client disassociated 15/07/03 14:20:11 WARN remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkExecutor@cruncher03.stratified:44617] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 15/07/03 14:20:11 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. cruncher03.stratified:44617 15/07/03 14:20:11 INFO scheduler.TaskSetManager: Re-queueing tasks for 3 from TaskSet 5.0 There is plenty of memory on the machine and container jvm, so I don't think it is an OOM (after all it would be a SIGKILL) or an OutOfMemory (there is no out of mem exception) What can be causing this? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/ERROR-executor-CoarseGrainedExecutorBackend-RECEIVED-SIGNAL-15-SIGTERM-tp23613.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org