[ https://issues.apache.org/jira/browse/SPARK-14228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16285920#comment-16285920 ]
KaiXinXIaoLei edited comment on SPARK-14228 at 12/11/17 3:17 PM: ----------------------------------------------------------------- Using this patch, this problem is still exists. When the number of executors is big, and YarnSchedulerBackend.stopped=False after YarnSchedulerBackend.stop() is running, some executor is stoped, and YarnSchedulerBackend.onDisconnected() will be called, then the problem is exists was (Author: kaixinxiaolei): Using this patch, this problem is still exists. > Lost executor of RPC disassociated, and occurs exception: Could not find > CoarseGrainedScheduler or it has been stopped > ---------------------------------------------------------------------------------------------------------------------- > > Key: SPARK-14228 > URL: https://issues.apache.org/jira/browse/SPARK-14228 > Project: Spark > Issue Type: Bug > Reporter: meiyoula > Fix For: 2.3.0 > > > When I start 1000 executors, and then stop the process. It will call > SparkContext.stop to stop all executors. But during this process, the > executors has been killed will lost of rpc with driver, and try to > reviveOffers, but can't find CoarseGrainedScheduler or it has been stopped. > {quote} > 16/03/29 01:45:45 ERROR YarnScheduler: Lost executor 610 on 51-196-152-8: > remote Rpc client disassociated > 16/03/29 01:45:45 ERROR Inbox: Ignoring error > org.apache.spark.SparkException: Could not find CoarseGrainedScheduler or it > has been stopped. > at > org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:161) > at > org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:131) > at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:173) > at > org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:398) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.reviveOffers(CoarseGrainedSchedulerBackend.scala:314) > at > org.apache.spark.scheduler.TaskSchedulerImpl.executorLost(TaskSchedulerImpl.scala:482) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.removeExecutor(CoarseGrainedSchedulerBackend.scala:261) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint$$anonfun$onDisconnected$1.apply(CoarseGrainedSchedulerBackend.scala:207) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint$$anonfun$onDisconnected$1.apply(CoarseGrainedSchedulerBackend.scala:207) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.onDisconnected(CoarseGrainedSchedulerBackend.scala:207) > at > org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:144) > at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204) > at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:102) > at > org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {quote} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org