[ 
https://issues.apache.org/jira/browse/SPARK-18976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15777821#comment-15777821
 ] 

liujianhui commented on SPARK-18976:
------------------------------------

thanks for your attention, I found the root cause, the reason is same with the 
issue https://issues.apache.org/jira/browse/SPARK-18994, The master found the 
worker's heartbeat expired and then remove it, but the executor on that worker 
is always alive,  since the standby Master becoming the active, this executor 
will reported to the new master along with the WorkerSchedulerStateResponse, 
the executor will add to corresponding app's executors list

> in standlone mode,executor expired by HeartbeanReceiver that still take up 
> cores but no tasks assigned to 
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-18976
>                 URL: https://issues.apache.org/jira/browse/SPARK-18976
>             Project: Spark
>          Issue Type: Bug
>          Components: Deploy
>    Affects Versions: 1.6.1
>         Environment: jdk1.8.0_77 Red Hat 4.4.7-11
>            Reporter: liujianhui
>         Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png
>
>
> h2. scene
> when executor expired by HeartbeatReceiver in driver, driver will mark that 
> executor as not live, task scheduler will not assign tasks to that executor, 
> but that executor's status will always be running and take up cores, the 
> executor 18 was expired and no task running, the task time far less than the 
> normal executor 142, but in app page, the executor is running
> !screenshot-1.png!
> !screenshot-2.png!
> !screenshot-3.png!
> h2.process:
> # exeuctor expired by HearbeatReceiver because the last heartbeat execeed the 
> executor timeout
> # executor will be removed in CoarseGrainedSchdulerBackend.killExecutors, so 
> that executor will marked as dead, it will not scheduled as offer since now 
> because it in executorsPendingToRemove
> # status of that executor is running because the CoarseGrainedExecutorBackend 
> processor is also exist and it register block manager to the driver every 
> 10s, log as 
> {code}
> 16/12/22 17:04:26 INFO Executor: Told to re-register on heartbeat
> 16/12/22 17:04:26 INFO BlockManager: BlockManager re-registering with master
> 16/12/22 17:04:26 INFO BlockManagerMaster: Trying to register BlockManager
> 16/12/22 17:04:26 INFO BlockManagerMaster: Registered BlockManager
> 16/12/22 17:04:26 INFO BlockManager: Reporting 0 blocks to the master.
> 16/12/22 17:04:36 INFO Executor: Told to re-register on heartbeat
> 16/12/22 17:04:36 INFO BlockManager: BlockManager re-registering with master
> 16/12/22 17:04:36 INFO BlockManagerMaster: Trying to register BlockManager
> 16/12/22 17:04:36 INFO BlockManagerMaster: Registered BlockManager
> 16/12/22 17:04:36 INFO BlockManager: Reporting 0 blocks to the master.
> 16/12/22 17:04:46 INFO Executor: Told to re-register on heartbeat
> 16/12/22 17:04:46 INFO BlockManager: BlockManager re-registering with master
> 16/12/22 17:04:46 INFO BlockManagerMaster: Trying to register BlockManager
> 16/12/22 17:04:46 INFO BlockManagerMaster: Registered BlockManager
> 16/12/22 17:04:46 INFO BlockManager: Reporting 0 blocks to the master.
> 16/12/22 17:04:56 INFO Executor: Told to re-register on heartbeat
> 16/12/22 17:04:56 INFO BlockManager: BlockManager re-registering with master
> 16/12/22 17:04:56 INFO BlockManagerMaster: Trying to register BlockManager
> 16/12/22 17:04:56 INFO BlockManagerMaster: Registered BlockManager
> 16/12/22 17:04:56 INFO BlockManager: Reporting 0 blocks to the master. 
> {code}
> h2. resolve 
> when the register times exceed some threshold(e.g. 10), the executor should 
> exit as zero 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to