You should check executor log to find out why it failed. There might have
more explanation.

--Xuefu

On Sun, Jan 10, 2016 at 11:21 PM, Jone Zhang <joyoungzh...@gmail.com> wrote:

> *I have submited a application many times.*
> *Most of applications running correctly.See attach 1.*
> *But one of the them breaks as expected.See attach 2.1 and 2.2.*
>
> *Why a small data size task running so long, and can't find any helpful
> information in yarn logs.*
>
> *Part of the log information is as follows*
> 16/01/11 12:45:19 INFO storage.BlockManagerMasterEndpoint: Trying to
> remove executor 1 from BlockManagerMaster.
> 16/01/11 12:45:19 INFO storage.BlockManagerMasterEndpoint: Removing block
> manager BlockManagerId(1, 10.226.148.160, 44366)
> 16/01/11 12:45:19 INFO storage.BlockManagerMaster: Removed 1 successfully
> in removeExecutor
> 16/01/11 12:50:32 INFO storage.BlockManagerInfo: Removed
> broadcast_2_piece0 on 10.219.58.123:39594 in memory (size: 92.2 KB, free:
> 441.4 MB)
> 16/01/11 12:55:20 WARN spark.HeartbeatReceiver: Removing executor 2 with
> no recent heartbeats: 604535 ms exceeds timeout 600000 ms
> 16/01/11 12:55:20 ERROR cluster.YarnClusterScheduler: Lost an executor 2
> (already removed): Executor heartbeat timed out after 604535 ms
> 16/01/11 12:55:20 WARN spark.HeartbeatReceiver: Removing executor 1 with
> no recent heartbeats: 609228 ms exceeds timeout 600000 ms
> 16/01/11 12:55:20 ERROR cluster.YarnClusterScheduler: Lost an executor 1
> (already removed): Executor heartbeat timed out after 609228 ms
> 16/01/11 12:55:20 WARN spark.HeartbeatReceiver: Removing executor 4 with
> no recent heartbeats: 615098 ms exceeds timeout 600000 ms
> 16/01/11 12:55:20 ERROR cluster.YarnClusterScheduler: Lost an executor 4
> (already removed): Executor heartbeat timed out after 615098 ms
> 16/01/11 12:55:20 WARN spark.HeartbeatReceiver: Removing executor 3 with
> no recent heartbeats: 616730 ms exceeds timeout 600000 ms
> 16/01/11 12:55:20 INFO cluster.YarnClusterSchedulerBackend: Requesting to
> kill executor(s) 2
> 16/01/11 12:55:20 ERROR cluster.YarnClusterScheduler: Lost an executor 3
> (already removed): Executor heartbeat timed out after 616730 ms
> 16/01/11 12:55:20 WARN cluster.YarnClusterSchedulerBackend: Executor to
> kill 2 does not exist!
> 16/01/11 12:55:20 INFO yarn.ApplicationMaster$AMEndpoint: Driver requested
> to kill executor(s) .
> 16/01/11 12:55:20 INFO cluster.YarnClusterSchedulerBackend: Requesting to
> kill executor(s) 1
> 16/01/11 12:55:20 WARN cluster.YarnClusterSchedulerBackend: Executor to
> kill 1 does not exist!
> 16/01/11 12:55:20 INFO yarn.ApplicationMaster$AMEndpoint: Driver requested
> to kill executor(s) .
> 16/01/11 12:55:20 INFO cluster.YarnClusterSchedulerBackend: Requesting to
> kill executor(s) 4
> 16/01/11 12:55:20 WARN cluster.YarnClusterSchedulerBackend: Executor to
> kill 4 does not exist!
> 16/01/11 12:55:20 INFO yarn.ApplicationMaster$AMEndpoint: Driver requested
> to kill executor(s) .
> 16/01/11 12:55:20 INFO cluster.YarnClusterSchedulerBackend: Requesting to
> kill executor(s) 3
> 16/01/11 12:55:20 WARN cluster.YarnClusterSchedulerBackend: Executor to
> kill 3 does not exist!
> 16/01/11 12:55:20 INFO yarn.ApplicationMaster$AMEndpoint: Driver requested
> to kill executor(s) .
> 16/01/11 14:29:55 WARN client.RemoteDriver: Shutting down driver because
> RPC channel was closed.
> 16/01/11 14:29:55 INFO client.RemoteDriver: Shutting down remote driver.
> 16/01/11 14:29:55 INFO scheduler.DAGScheduler: Asked to cancel job 1
> 16/01/11 14:29:55 INFO client.RemoteDriver: Failed to run job
> 2fbbb881-988b-4454-ad9e-a20783aaf38e
> java.lang.InterruptedException
>         at java.lang.Object.wait(Native Method)
>         at java.lang.Object.wait(Object.java:503)
>         at
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:371)
>         at
> org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:335)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> 16/01/11 14:29:55 INFO cluster.YarnClusterScheduler: Cancelling stage 2
> 16/01/11 14:29:55 INFO cluster.YarnClusterScheduler: Removed TaskSet 2.0,
> whose tasks have all completed, from pool
> 16/01/11 14:29:55 INFO cluster.YarnClusterScheduler: Stage 2 was cancelled
> 16/01/11 14:29:55 INFO scheduler.DAGScheduler: ShuffleMapStage 2
> (mapPartitionsToPair at MapTran.java:31) failed in 6278.824 s
> 16/01/11 14:29:55 INFO handler.ContextHandler: stopped
> o.s.j.s.ServletContextHandler{/metrics/json,null}
> 16/01/11 14:29:55 INFO handler.ContextHandler: stopped
> o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
> 16/01/11 14:29:55 INFO handler.ContextHandler: stopped
> o.s.j.s.ServletContextHandler{/api,null}
> 16/01/11 14:29:55 INFO handler.ContextHandler: stopped
> o.s.j.s.ServletContextHandler{/,null}
> 16/01/11 14:29:55 INFO handler.ContextHandler: stopped
> o.s.j.s.ServletContextHandler{/static,null}
>
>
> *Best wishes.*
> *Thanks.*
>
>

Reply via email to