Spark executor crashes when the tasks are cancelled

Kiran Chitturi Wed, 27 Apr 2016 16:47:34 -0700

Hi,

We are seeing this issue with Spark 1.6.1. The executor is exiting when one
of the running tasks is cancelled.


The executor logs is showing the below error and crashing.

16/04/27 16:34:13 ERROR SparkUncaughtExceptionHandler: [Container in
> shutdown] Uncaught exception in thread Thread[Executor task launch
> worker-2,5,main]
> java.lang.Error: java.nio.channels.ClosedByInterruptException
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1148)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.nio.channels.ClosedByInterruptException
> at
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
> at
> java.nio.channels.Channels$WritableByteChannelImpl.write(Channels.java:460)
> at
> org.apache.spark.util.SerializableBuffer$$anonfun$writeObject$1.apply(SerializableBuffer.scala:49)
> at
> org.apache.spark.util.SerializableBuffer$$anonfun$writeObject$1.apply(SerializableBuffer.scala:47)
> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1219)
> at
> org.apache.spark.util.SerializableBuffer.writeObject(SerializableBuffer.scala:47)
> at sun.reflect.GeneratedMethodAccessor30.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988)
> at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496)
> at
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
> at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
> at
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
> at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
> at
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
> at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
> at
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
> at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
> at
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
> at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
> at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
> at
> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44)
> at
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101)
> at org.apache.spark.rpc.netty.NettyRpcEnv.serialize(NettyRpcEnv.scala:252)
> at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:195)
> at
> org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:516)
> at
> org.apache.spark.executor.CoarseGrainedExecutorBackend.statusUpdate(CoarseGrainedExecutorBackend.scala:132)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:288)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> ... 2 more


I have attached the full logs at this gist:
https://gist.github.com/kiranchitturi/3bd3a083a7c956cff73040c1a140c88f

On the driver side, the following info is logged (
https://gist.github.com/kiranchitturi/3bd3a083a7c956cff73040c1a140c88f)

The following lines show that Executor exited because of the running tasks

2016-04-27T16:34:13,723 - WARN [dispatcher-event-loop-1:Logging$class@70] -
> Lost task 0.0 in stage 89.0 (TID 173, 10.0.0.42): ExecutorLostFailure
> (executor 2 exited caused by one of the running tasks) Reason: Remote RPC
> client di
> sassociated. Likely due to containers exceeding thresholds, or network
> issues. Check driver logs for WARN messages.
> 2016-04-27T16:34:13,723 - WARN [dispatcher-event-loop-1:Logging$class@70]
> - Lost task 1.0 in stage 89.0 (TID 174, 10.0.0.42): ExecutorLostFailure
> (executor 2 exited caused by one of the running tasks) Reason: Remote RPC
> client di


Is it possible for executor to die when the jobs in the sparkContext are
cancelled ? Apart from https://issues.apache.org/jira/browse/SPARK-14234, I
could not find any Jiras that report this error.

Sometimes, we notice a scenario where the executor dies and driver doesn't
request for a new one. This causes the jobs to hang indefinitely. We are
using dynamic allocation for our jobs.

Thanks,


>
Kiran Chitturi

Spark executor crashes when the tasks are cancelled

Reply via email to