[ 
https://issues.apache.org/jira/browse/SPARK-24617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-24617.
----------------------------------
    Resolution: Incomplete

> Spark driver not requesting another executor once original executor exits due 
> to 'lost worker'
> ----------------------------------------------------------------------------------------------
>
>                 Key: SPARK-24617
>                 URL: https://issues.apache.org/jira/browse/SPARK-24617
>             Project: Spark
>          Issue Type: Bug
>          Components: Scheduler
>    Affects Versions: 2.1.1
>            Reporter: t oo
>            Priority: Major
>              Labels: bulk-closed
>
> I am running Spark v2.1.1 in 'standalone' mode (no yarn/mesos) across EC2s. I 
> have 1 master ec2 that acts as the driver (since spark-submit is called on 
> this host), spark.master is setup, deploymode is client (so sparksubmit only 
> returns a ReturnCode to the putty window once it finishes processing). I have 
> 1 worker ec2 that is registered with the spark master. When i run sparksubmit 
> on the master, I can see in the WebUI that executors starting on the worker 
> and I can verify successful completion. However if while the sparksubmit is 
> running and the worker ec2 gets terminated and then new ec2 worker becomes 
> alive 3mins later and registers with the master, I have noticed on the webui 
> that it shows 'cannot find address' in the executor status but the driver 
> keeps waiting forever (2 days later I kill it) or in some cases the driver 
> allocates tasks to the new worker only 5 hours later and then completes! Is 
> there some setting i am missing that would explain this behavior?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to