wuyi created SPARK-34245:
----------------------------

             Summary: Master may not remove the finished executor when Worker 
fails to send ExecutorStateChanged
                 Key: SPARK-34245
                 URL: https://issues.apache.org/jira/browse/SPARK-34245
             Project: Spark
          Issue Type: Improvement
          Components: Deploy, Spark Core
    Affects Versions: 3.0.1, 2.4.7, 3.2.0, 3.1.1
            Reporter: wuyi


If the Worker fails to send ExecutorStateChanged to the Master due to some 
errors, e.g., temporary network error, then the Master can't remove the 
finished executor normally and think the executor is still alive. In the worst 
case, if the executor is the only one executor for the application, the 
application can get hang.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to