[ 
https://issues.apache.org/jira/browse/SPARK-3612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-3612.
------------------------------------
       Resolution: Fixed
    Fix Version/s: 1.2.0
                   1.1.1
         Assignee: Sandy Ryza

https://github.com/apache/spark/pull/2487

> Executor shouldn't quit if heartbeat message fails to reach the driver
> ----------------------------------------------------------------------
>
>                 Key: SPARK-3612
>                 URL: https://issues.apache.org/jira/browse/SPARK-3612
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>            Reporter: Reynold Xin
>            Assignee: Sandy Ryza
>             Fix For: 1.1.1, 1.2.0
>
>
> The thread started by Executor.startDriverHeartbeater can actually terminate 
> the whole executor if AkkaUtils.askWithReply[HeartbeatResponse] throws an 
> exception. 
> I don't think we should quit the executor this way. At the very least, we 
> would want to log a more meaningful exception then simply
> {code}
> 14/09/20 06:38:12 WARN AkkaUtils: Error sending message in 1 attempts
> java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
>         at 
> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
>         at 
> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
>         at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
>         at 
> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
>         at scala.concurrent.Await$.result(package.scala:107)
>         at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176)
>         at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:379)
> 14/09/20 06:38:45 WARN AkkaUtils: Error sending message in 2 attempts
> java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
>         at 
> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
>         at 
> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
>         at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
>         at 
> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
>         at scala.concurrent.Await$.result(package.scala:107)
>         at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176)
>         at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:379)
> 14/09/20 06:39:18 WARN AkkaUtils: Error sending message in 3 attempts
> java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
>         at 
> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
>         at 
> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
>         at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
>         at 
> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
>         at scala.concurrent.Await$.result(package.scala:107)
>         at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176)
>         at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:379)
> 14/09/20 06:39:21 ERROR ExecutorUncaughtExceptionHandler: Uncaught exception 
> in thread Thread[Driver Heartbeater,5,main]
> org.apache.spark.SparkException: Error sending message [message = 
> Heartbeat(281,[Lscala.Tuple2;@4d9294db,BlockManagerId(281, 
> ip-172-31-7-55.eu-west-1.compute.internal, 52303))]
>         at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:190)
>         at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:379)
> Caused by: java.util.concurrent.TimeoutException: Futures timed out after [30 
> seconds]
>         at 
> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
>         at 
> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
>         at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
>         at 
> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
>         at scala.concurrent.Await$.result(package.scala:107)
>         at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176)
>         ... 1 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to