[ https://issues.apache.org/jira/browse/SPARK-34453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dmitro Valentiev updated SPARK-34453: ------------------------------------- Attachment: driver.log executors.png > ExecutorPodsLifecycleManager fails to remove executors in Kubernetes, SPARK > 3.0.1 > --------------------------------------------------------------------------------- > > Key: SPARK-34453 > URL: https://issues.apache.org/jira/browse/SPARK-34453 > Project: Spark > Issue Type: Bug > Components: Kubernetes > Affects Versions: 3.0.1 > Environment: SPARK 3.0.1 > EKS 1.15 > Spark cluster runs in Kubernetes cluster though spark submit. > Reporter: Dmitro Valentiev > Priority: Minor > Attachments: driver.log, executors.png > > > Happens when driver fails to register the reason behind deletion, e.g: > {code:java} > 2021-02-17 12:07:56,953 DEBUG > KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint:61 - Asked to > remove executor 1 with reason The executor with id 1 was deleted by a user or > the framework. > {code} > ExecutorPodsLifecycleManager fails to remove missing executor and gets stuck > in this loop: > {code:java} > 2021-02-17 12:13:39,023 DEBUG ExecutorPodsLifecycleManager:61 - Removed > executors with ids 3 from Spark that were either found to be deleted or > non-existent in the cluster. > 2021-02-17 12:15:09,042 DEBUG ExecutorPodsLifecycleManager:61 - The executor > with ID 3 was not found in the cluster but we didn't get a reason why. > Marking the executor as failed. The executor may have been deleted but the > driver missed the deletion event. > {code} > > Steps to reproduce: > # Deploy spark cluster into Kubernetes > # Delete an executor pod though kubectl > > Could be linked / duplicate of SPARK-28488 > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org