Saisai Shao created SPARK-12447:
-----------------------------------

             Summary: Only update AM's internal state when executor is 
successfully launched by NM
                 Key: SPARK-12447
                 URL: https://issues.apache.org/jira/browse/SPARK-12447
             Project: Spark
          Issue Type: Bug
          Components: YARN
    Affects Versions: 1.6.0
            Reporter: Saisai Shao


Currently {{YarnAllocator}} will update its managed states like 
{{numExecutorsRunning}} after container is allocated but before executor are 
successfully launched. 

This happened when Spark configuration is wrong, which makes executor fail to 
launch, or NM lost when NMClient is communicated.

In the current implementation, state will also be updated even executor is 
failed to launch, this will lead to incorrect state of AM. Also lingering 
container will only be release after timeout, this will introduce resource 
waste.

So here we should update the states only after executor is correctly launched, 
otherwise we should release container ASAP to make it fail fast and retry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to