Hadoop 2.4.0. Here is the relevant logs from executor 1136 16/03/18 21:26:58 INFO mapred.SparkHadoopMapRedUtil: attempt_201603182126_0276_m_000484_0: Committed16/03/18 21:26:58 INFO executor.Executor: Finished task 484.0 in stage 276.0 (TID 59663). 1080 bytes result sent to driver16/03/18 21:38:18 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15: SIGTERM16/03/18 21:38:18 INFO storage.DiskBlockManager: Shutdown hook called16/03/18 21:38:18 INFO util.ShutdownHookManager: Shutdown hook called
On Fri, Mar 18, 2016 at 4:21 PM Ted Yu <yuzhih...@gmail.com> wrote: Which version of hadoop do you use ? > > bq. Requesting to kill executor(s) 1136 > > Can you find more information on executor 1136 ? > > Thanks > > On Fri, Mar 18, 2016 at 4:16 PM, Nezih Yigitbasi < > nyigitb...@netflix.com.invalid> wrote: > >> Hi Spark experts, >> I am using Spark 1.5.2 on YARN with dynamic allocation enabled. I see in >> the driver/application master logs that the app is marked as SUCCEEDED and >> then SparkContext stop is called. However, this stop sequence takes > 10 >> minutes to complete, and YARN resource manager kills the application master >> as it didn’t receive a heartbeat within the last 10 minutes. The resource >> manager then kills the application master. Any ideas about what may be >> going on? >> >> Here are the relevant logs: >> >> *6/03/18 21:26:58 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, >> exitCode: 0 >> 16/03/18 21:26:58 INFO spark.SparkContext: Invoking stop() from shutdown >> hook*16/03/18 21:26:58 INFO handler.ContextHandler: stopped >> o.s.j.s.ServletContextHandler{/static/sql,null}16/03/18 21:26:58 INFO >> handler.ContextHandler: stopped >> o.s.j.s.ServletContextHandler{/SQL/execution/json,null}16/03/18 21:26:58 >> INFO handler.ContextHandler: stopped >> o.s.j.s.ServletContextHandler{/SQL/execution,null}16/03/18 21:26:58 INFO >> handler.ContextHandler: stopped >> o.s.j.s.ServletContextHandler{/SQL/json,null}16/03/18 21:26:58 INFO >> handler.ContextHandler: stopped >> o.s.j.s.ServletContextHandler{/SQL,null}16/03/18 21:26:58 INFO >> handler.ContextHandler: stopped >> o.s.j.s.ServletContextHandler{/static/sql,null}16/03/18 21:26:58 INFO >> handler.ContextHandler: stopped >> o.s.j.s.ServletContextHandler{/SQL/execution/json,null}16/03/18 21:26:58 >> INFO handler.ContextHandler: stopped >> o.s.j.s.ServletContextHandler{/SQL/execution,null}16/03/18 21:26:58 INFO >> handler.ContextHandler: stopped >> o.s.j.s.ServletContextHandler{/SQL/json,null}16/03/18 21:26:58 INFO >> handler.ContextHandler: stopped >> o.s.j.s.ServletContextHandler{/SQL,null}16/03/18 21:26:58 INFO >> handler.ContextHandler: stopped >> o.s.j.s.ServletContextHandler{/metrics/json,null}16/03/18 21:26:58 INFO >> handler.ContextHandler: stopped >> o.s.j.s.ServletContextHandler{/stages/stage/kill,null}16/03/18 21:26:58 INFO >> handler.ContextHandler: stopped >> o.s.j.s.ServletContextHandler{/api,null}16/03/18 21:26:58 INFO >> handler.ContextHandler: stopped >> o.s.j.s.ServletContextHandler{/,null}16/03/18 21:26:58 INFO >> handler.ContextHandler: stopped >> o.s.j.s.ServletContextHandler{/static,null}16/03/18 21:26:58 INFO >> handler.ContextHandler: stopped >> o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}16/03/18 >> 21:26:58 INFO handler.ContextHandler: stopped >> o.s.j.s.ServletContextHandler{/executors/threadDump,null}16/03/18 21:26:58 >> INFO handler.ContextHandler: stopped >> o.s.j.s.ServletContextHandler{/executors/json,null}16/03/18 21:26:58 INFO >> handler.ContextHandler: stopped >> o.s.j.s.ServletContextHandler{/executors,null}16/03/18 21:26:58 INFO >> handler.ContextHandler: stopped >> o.s.j.s.ServletContextHandler{/environment/json,null}16/03/18 21:26:58 INFO >> handler.ContextHandler: stopped >> o.s.j.s.ServletContextHandler{/environment,null}16/03/18 21:26:58 INFO >> handler.ContextHandler: stopped >> o.s.j.s.ServletContextHandler{/storage/rdd/json,null}16/03/18 21:26:58 INFO >> handler.ContextHandler: stopped >> o.s.j.s.ServletContextHandler{/storage/rdd,null}16/03/18 21:26:58 INFO >> handler.ContextHandler: stopped >> o.s.j.s.ServletContextHandler{/storage/json,null}16/03/18 21:26:58 INFO >> handler.ContextHandler: stopped >> o.s.j.s.ServletContextHandler{/storage,null}16/03/18 21:26:58 INFO >> handler.ContextHandler: stopped >> o.s.j.s.ServletContextHandler{/stages/pool/json,null}16/03/18 21:26:58 INFO >> handler.ContextHandler: stopped >> o.s.j.s.ServletContextHandler{/stages/pool,null}16/03/18 21:26:58 INFO >> handler.ContextHandler: stopped >> o.s.j.s.ServletContextHandler{/stages/stage/json,null}16/03/18 21:26:58 INFO >> handler.ContextHandler: stopped >> o.s.j.s.ServletContextHandler{/stages/stage,null}16/03/18 21:26:58 INFO >> handler.ContextHandler: stopped >> o.s.j.s.ServletContextHandler{/stages/json,null}16/03/18 21:26:58 INFO >> handler.ContextHandler: stopped >> o.s.j.s.ServletContextHandler{/stages,null}16/03/18 21:26:58 INFO >> handler.ContextHandler: stopped >> o.s.j.s.ServletContextHandler{/jobs/job/json,null}16/03/18 21:26:58 INFO >> handler.ContextHandler: stopped >> o.s.j.s.ServletContextHandler{/jobs/job,null}16/03/18 21:26:58 INFO >> handler.ContextHandler: stopped >> o.s.j.s.ServletContextHandler{/jobs/json,null}16/03/18 21:26:58 INFO >> handler.ContextHandler: stopped >> o.s.j.s.ServletContextHandler{/jobs,null}16/03/18 21:26:58 INFO ui.SparkUI: >> Stopped Spark web UI at http://10.143.240.240:5270616/03/18 21:27:58 INFO >> cluster.YarnClusterSchedulerBackend: Requesting to kill executor(s) >> 113516/03/18 21:27:58 INFO yarn.YarnAllocator: Driver requested a total >> number of 208 executor(s).16/03/18 21:27:58 INFO >> yarn.ApplicationMaster$AMEndpoint: Driver requested to kill executor(s) >> 1135.16/03/18 21:27:58 INFO spark.ExecutorAllocationManager: Removing >> executor 1135 because it has been idle for 60 seconds (new desired total >> will be 208)16/03/18 21:27:58 INFO cluster.YarnClusterSchedulerBackend: >> Requesting to kill executor(s) 112316/03/18 21:27:58 INFO >> yarn.YarnAllocator: Driver requested a total number of 207 >> executor(s).16/03/18 21:27:58 INFO yarn.ApplicationMaster$AMEndpoint: Driver >> requested to kill executor(s) 1123.16/03/18 21:27:58 INFO >> spark.ExecutorAllocationManager: Removing executor 1123 because it has been >> idle for 60 seconds (new desired total will be 207)16/03/18 21:27:58 INFO >> cluster.YarnClusterSchedulerBackend: Requesting to kill executor(s) >> 111716/03/18 21:27:58 INFO yarn.YarnAllocator: Driver requested a total >> number of 206 executor(s).16/03/18 21:27:58 INFO >> yarn.ApplicationMaster$AMEndpoint: Driver requested to kill executor(s) >> 1117.16/03/18 21:27:58 INFO spark.ExecutorAllocationManager: Removing >> executor 1117 because it has been idle for 60 seconds (new desired total >> will be 206)16/03/18 21:27:58 INFO cluster.YarnClusterSchedulerBackend: >> Requesting to kill executor(s) 118516/03/18 21:27:58 INFO >> yarn.YarnAllocator: Driver requested a total number of 205 >> executor(s).16/03/18 21:27:58 INFO yarn.ApplicationMaster$AMEndpoint: Driver >> requested to kill executor(s) 1185.16/03/18 21:27:58 INFO >> spark.ExecutorAllocationManager: Removing executor 1185 because it has been >> idle for 60 seconds (new desired total will be 205)16/03/18 21:27:58 INFO >> cluster.YarnClusterSchedulerBackend: Requesting to kill executor(s) >> 115216/03/18 21:27:58 INFO yarn.YarnAllocator: Driver requested a total >> number of 204 executor(s).16/03/18 21:27:58 INFO >> yarn.ApplicationMaster$AMEndpoint: Driver requested to kill executor(s) >> 1152.16/03/18 21:27:58 INFO spark.ExecutorAllocationManager: Removing >> executor 1152 because it has been idle for 60 seconds (new desired total >> will be 204)16/03/18 21:27:58 INFO cluster.YarnClusterSchedulerBackend: >> Requesting to kill executor(s) 114016/03/18 21:27:58 INFO >> yarn.YarnAllocator: Driver requested a total number of 203 >> executor(s).16/03/18 21:27:58 INFO yarn.ApplicationMaster$AMEndpoint: Driver >> requested to kill executor(s) 1140.16/03/18 21:27:58 INFO >> spark.ExecutorAllocationManager: Removing executor 1140 because it has been >> idle for 60 seconds (new desired total will be 203)16/03/18 21:27:58 INFO >> cluster.YarnClusterSchedulerBackend: Requesting to kill executor(s) >> 114916/03/18 21:27:58 INFO yarn.YarnAllocator: Driver requested a total >> number of 202 executor(s).16/03/18 21:27:58 INFO >> yarn.ApplicationMaster$AMEndpoint: Driver requested to kill executor(s) >> 1149.16/03/18 21:27:58 INFO spark.ExecutorAllocationManager: Removing >> executor 1149 because it has been idle for 60 seconds (new desired total >> will be 202)16/03/18 21:27:58 INFO cluster.YarnClusterSchedulerBackend: >> Requesting to kill executor(s) 115416/03/18 21:27:58 INFO >> yarn.YarnAllocator: Driver requested a total number of 201 >> executor(s).16/03/18 21:27:58 INFO yarn.ApplicationMaster$AMEndpoint: Driver >> requested to kill executor(s) 1154.16/03/18 21:27:58 INFO >> spark.ExecutorAllocationManager: Removing executor 1154 because it has been >> idle for 60 seconds (new desired total will be 201)16/03/18 21:27:58 INFO >> cluster.YarnClusterSchedulerBackend: Requesting to kill executor(s) >> 113616/03/18 21:27:58 INFO yarn.YarnAllocator: Driver requested a total >> number of 200 executor(s).16/03/18 21:27:58 INFO >> yarn.ApplicationMaster$AMEndpoint: Driver requested to kill executor(s) >> 1136.16/03/18 21:27:58 INFO spark.ExecutorAllocationManager: Removing >> executor 1136 because it has been idle for 60 seconds (new desired total >> will be 200)*16/03/18 21:38:17 ERROR yarn.ApplicationMaster: RECEIVED SIGNAL >> 15: SIGTERM >> * >> >> >> > >