Hi Rafeeq, I think this situation always occurs when your Spark Streaming application is running in an abnormal situation. Would you mind checking your job processing time in WebUI or log, is the total latency of job processing + job scheduling time larger than batch duration? If your Spark Streaming application is in this situation, you will meet this exception.
Normally the reason of this happening is that Spark Streaming job processed one by one by default, if one job is blocked for a long time, the next job has to wait until the previous one is finished, but input block will be deleted after timeout, so when this job is started, it cannot find the right block and will throw this exception. Also you will meet this exception in other abnormal situation. Anyway this exception means your application is abnormal, you should pay attention to your job execution time. You can check the spark streaming doc<http://spark.apache.org/docs/latest/streaming-programming-guide.html> “Monitoring Applications” section to see the details. Thanks Jerry From: Rafeeq S [mailto:rafeeq.ec...@gmail.com] Sent: Thursday, September 18, 2014 2:43 PM To: u...@spark.incubator.apache.org Subject: Serious Issue with Spark Streaming ? Blocks Getting Removed and Jobs have Failed.. Hi, I am testing kafka-spark streaming application which throws below error after few seconds and below configuration is used for spark streaming test environment. kafka version- 0.8.1 spark version- 1.0.1 SPARK_MASTER_MEMORY="1G" SPARK_DRIVER_MEMORY="1G" SPARK_WORKER_INSTANCES="1" SPARK_EXECUTOR_INSTANCES="1" SPARK_WORKER_MEMORY="1G" SPARK_EXECUTOR_MEMORY="1G" SPARK_WORKER_CORES="2" SPARK_EXECUTOR_CORES="1" ERROR: 14/09/12 17:30:23 WARN TaskSetManager: Loss was due to java.lang.Exception java.lang.Exception: Could not compute split, block input-4-1410542878200 not found at org.apache.spark.rdd.BlockRDD.compute(BlockRDD.scala:51) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.rdd.UnionPartition.iterator(UnionRDD.scala:33) at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:74) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77) at org.apache.spark.rdd.RDD.iterator(RDD.scala:227) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.Task.run(Task.scala:51) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Please suggest your answer Regards, Rafeeq S (“What you do is what matters, not what you think or say or plan.” )