[ https://issues.apache.org/jira/browse/SPARK-24617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-24617. ---------------------------------- Resolution: Incomplete > Spark driver not requesting another executor once original executor exits due > to 'lost worker' > ---------------------------------------------------------------------------------------------- > > Key: SPARK-24617 > URL: https://issues.apache.org/jira/browse/SPARK-24617 > Project: Spark > Issue Type: Bug > Components: Scheduler > Affects Versions: 2.1.1 > Reporter: t oo > Priority: Major > Labels: bulk-closed > > I am running Spark v2.1.1 in 'standalone' mode (no yarn/mesos) across EC2s. I > have 1 master ec2 that acts as the driver (since spark-submit is called on > this host), spark.master is setup, deploymode is client (so sparksubmit only > returns a ReturnCode to the putty window once it finishes processing). I have > 1 worker ec2 that is registered with the spark master. When i run sparksubmit > on the master, I can see in the WebUI that executors starting on the worker > and I can verify successful completion. However if while the sparksubmit is > running and the worker ec2 gets terminated and then new ec2 worker becomes > alive 3mins later and registers with the master, I have noticed on the webui > that it shows 'cannot find address' in the executor status but the driver > keeps waiting forever (2 days later I kill it) or in some cases the driver > allocates tasks to the new worker only 5 hours later and then completes! Is > there some setting i am missing that would explain this behavior? -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org