I am using spark 1.1.0 running a streaming job that uses updateStateByKey and then (after a bunch of maps/flatMaps) does a foreachRDD to save data in each RDD by making HTTP calls. The issue is that each time I attempt to save the RDD (using foreach on RDD), it gives me the following exception:
org.apache.spark.SparkException: Checkpoint RDD CheckpointRDD[467] at apply at List.scala:318(0) has different number of partitions than original RDD MapPartitionsRDD[461] at mapPartitions at StateDStream.scala:71(56) I read through the code but couldn't understand why this exception is happening. Any help would be appreciated! Thanks, Aniket