[ https://issues.apache.org/jira/browse/SPARK-17300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15853630#comment-15853630 ]
DjvuLee commented on SPARK-17300: --------------------------------- [~rdblue] Is there any fix for this issue later? > ClosedChannelException caused by missing block manager when speculative tasks > are killed > ---------------------------------------------------------------------------------------- > > Key: SPARK-17300 > URL: https://issues.apache.org/jira/browse/SPARK-17300 > Project: Spark > Issue Type: Bug > Reporter: Ryan Blue > > We recently backported SPARK-10530 to our Spark build, which kills > unnecessary duplicate/speculative tasks when one completes (either a > speculative task or the original). In large jobs with 500+ executors, this > caused some executors to die and resulted in the same error that was fixed by > SPARK-15262: ClosedChannelException when trying to connect to the block > manager on affected hosts. > {code} > java.nio.channels.ClosedChannelException > at > org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239) > at > org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226) > at > io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) > at > io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567) > at > io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:801) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:699) > at > io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1122) > at > io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633) > at > io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32) > at > io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908) > at > io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:960) > at > io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893) > at > io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.nio.channels.ClosedChannelException > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org