[jira] [Comment Edited] (SPARK-14228) Lost executor of RPC disassociated, and occurs exception: Could not find CoarseGrainedScheduler or it has been stopped

KaiXinXIaoLei (JIRA) Mon, 11 Dec 2017 07:18:14 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-14228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16285920#comment-16285920
 ]


KaiXinXIaoLei edited comment on SPARK-14228 at 12/11/17 3:17 PM:
-----------------------------------------------------------------

Using this patch, this problem is still exists. When the number of executors is 
big, and  YarnSchedulerBackend.stopped=False after YarnSchedulerBackend.stop() 
is running, some executor is stoped, and YarnSchedulerBackend.onDisconnected() 
will be called, then the problem is exists


was (Author: kaixinxiaolei):
Using this patch, this problem is still exists.

> Lost executor of RPC disassociated, and occurs exception: Could not find 
> CoarseGrainedScheduler or it has been stopped
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-14228
>                 URL: https://issues.apache.org/jira/browse/SPARK-14228
>             Project: Spark
>          Issue Type: Bug
>            Reporter: meiyoula
>             Fix For: 2.3.0
>
>
> When I start 1000 executors, and then stop the process. It will call 
> SparkContext.stop to stop all executors. But during this process, the 
> executors has been killed will lost of rpc with driver, and try to 
> reviveOffers, but can't find CoarseGrainedScheduler or it has been stopped.
> {quote}
> 16/03/29 01:45:45 ERROR YarnScheduler: Lost executor 610 on 51-196-152-8: 
> remote Rpc client disassociated
> 16/03/29 01:45:45 ERROR Inbox: Ignoring error
> org.apache.spark.SparkException: Could not find CoarseGrainedScheduler or it 
> has been stopped.
>       at 
> org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:161)
>       at 
> org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:131)
>       at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:173)
>       at 
> org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:398)
>       at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.reviveOffers(CoarseGrainedSchedulerBackend.scala:314)
>       at 
> org.apache.spark.scheduler.TaskSchedulerImpl.executorLost(TaskSchedulerImpl.scala:482)
>       at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.removeExecutor(CoarseGrainedSchedulerBackend.scala:261)
>       at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint$$anonfun$onDisconnected$1.apply(CoarseGrainedSchedulerBackend.scala:207)
>       at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint$$anonfun$onDisconnected$1.apply(CoarseGrainedSchedulerBackend.scala:207)
>       at scala.Option.foreach(Option.scala:236)
>       at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.onDisconnected(CoarseGrainedSchedulerBackend.scala:207)
>       at 
> org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:144)
>       at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
>       at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:102)
>       at 
> org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-14228) Lost executor of RPC disassociated, and occurs exception: Could not find CoarseGrainedScheduler or it has been stopped

Reply via email to