[ 
https://issues.apache.org/jira/browse/SPARK-20079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guoqiang Li updated SPARK-20079:
--------------------------------
    Description: 
The ExecutorAllocationManager.reset method is called when re-registering AM, 
which sets the ExecutorAllocationManager.initializing field true. When this 
field is true, the Driver does not start a new executor from the AM request. 
The following two cases will cause the field to False

1. A executor idle for some time.
2. There are new stages to be submitted


After the a stage was submitted, the AM was killed and restart ,the above two 
cases will not appear.

1. When AM is killed, the yarn will kill all running containers. All execuotr 
will be lost and no executor will be idle.
2. No surviving executor, resulting in the current stage will never be 
completed, DAG will not submit a new stage.


Reproduction steps:

1. Start cluster

{noformat}
echo -e "sc.parallelize(1 to 2000).foreach(_ => Thread.sleep(1000))" | 
./bin/spark-shell  --master yarn-client --executor-cores 1 --conf 
spark.shuffle.service.enabled=true --conf spark.dynamicAllocation.enabled=true 
--conf spark.dynamicAllocation.maxExecutors=2
{noformat}

2.  Kill the AM process when a stage is scheduled. 

  was:

1. Start cluster

echo -e "sc.parallelize(1 to 2000).foreach(_ => Thread.sleep(1000))" | 
./bin/spark-shell  --master yarn-client --executor-cores 1 --conf 
spark.shuffle.service.enabled=true --conf spark.dynamicAllocation.enabled=true 
--conf spark.dynamicAllocation.maxExecutors=2 

2.  Kill the AM process when a stage is scheduled. 


> Re registration of AM hangs spark cluster in yarn-client mode
> -------------------------------------------------------------
>
>                 Key: SPARK-20079
>                 URL: https://issues.apache.org/jira/browse/SPARK-20079
>             Project: Spark
>          Issue Type: Bug
>          Components: YARN
>    Affects Versions: 2.1.0
>            Reporter: Guoqiang Li
>
> The ExecutorAllocationManager.reset method is called when re-registering AM, 
> which sets the ExecutorAllocationManager.initializing field true. When this 
> field is true, the Driver does not start a new executor from the AM request. 
> The following two cases will cause the field to False
> 1. A executor idle for some time.
> 2. There are new stages to be submitted
> After the a stage was submitted, the AM was killed and restart ,the above two 
> cases will not appear.
> 1. When AM is killed, the yarn will kill all running containers. All execuotr 
> will be lost and no executor will be idle.
> 2. No surviving executor, resulting in the current stage will never be 
> completed, DAG will not submit a new stage.
> Reproduction steps:
> 1. Start cluster
> {noformat}
> echo -e "sc.parallelize(1 to 2000).foreach(_ => Thread.sleep(1000))" | 
> ./bin/spark-shell  --master yarn-client --executor-cores 1 --conf 
> spark.shuffle.service.enabled=true --conf 
> spark.dynamicAllocation.enabled=true --conf 
> spark.dynamicAllocation.maxExecutors=2
> {noformat}
> 2.  Kill the AM process when a stage is scheduled. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to