Saisai Shao created SPARK-12447: ----------------------------------- Summary: Only update AM's internal state when executor is successfully launched by NM Key: SPARK-12447 URL: https://issues.apache.org/jira/browse/SPARK-12447 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.6.0 Reporter: Saisai Shao
Currently {{YarnAllocator}} will update its managed states like {{numExecutorsRunning}} after container is allocated but before executor are successfully launched. This happened when Spark configuration is wrong, which makes executor fail to launch, or NM lost when NMClient is communicated. In the current implementation, state will also be updated even executor is failed to launch, this will lead to incorrect state of AM. Also lingering container will only be release after timeout, this will introduce resource waste. So here we should update the states only after executor is correctly launched, otherwise we should release container ASAP to make it fail fast and retry. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org