Spark Streaming job was running on two worker nodes and then there was an error on one of the nodes. The spark job showed running but no progress was being made and not processing any new messages. Based on the driver log files I see the following errors.
I would expect the stream reading would be retried and continue processing new messages. Is there any configuration that I could be missing. System.setProperty("spark.serializer", "org.apache.spark.serializer.KryoSerializer") System.setProperty("spark.kryo.registrator", "com.snc.sinet.streaming.StreamAggregatorKryoRegistrator") System.setProperty("spark.local.dir", Configuration.streamingConfig.localDir) System.setProperty("spark.ui.port", Configuration.streamingConfig.uiPort.toString) 2014-04-05 18:22:26,507 ERROR akka.remote.EndpointWriter spark-akka.actor.default-dispatcher-3 - AssociationError [akka.tcp://sp...@hclient01.sea1.service-now.com:49048] <- [akka.tcp://sp...@hclient02.sea1.service-now.com:50888]: Error [Shut down address: akka.tcp://sp...@hclient02.sea1.service-now.com:50888] [ akka.remote.ShutDownAssociation: Shut down address: akka.tcp://sp...@hclient02.sea1.service-now.com:50888 Caused by: akka.remote.transport.Transport$InvalidAssociationException: The remote system terminated the association because it is shutting down. ] akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkexecu...@hclient02.sea1.service-now.com:47512] Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: hclient02.sea1.service-now.com/10.196.32.78:47512 ] 2014-04-05 18:21:52,893 WARN o.a.spark.scheduler.TaskSetManager - Loss was due to java.lang.IllegalStateException java.lang.IllegalStateException: unread block data at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2418) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1379) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1988) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1912) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1795) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1347) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40) at org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:64) at org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:139) at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1834) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1793) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1347) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:369) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:62) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:195) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:46) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:45) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/KafkaInputDStream-Stops-reading-new-messages-tp4016.html Sent from the Apache Spark User List mailing list archive at Nabble.com.