I am getting strange behavior with the RDDs. All I want is to persist the RDD contents in a single file.
The saveAsTextFile() saves them in multiple textfiles for each partition. So I tried with rdd.coalesce(1,true).saveAsTextFile(). This fails with the exception : org.apache.spark.SparkException: Job aborted: Task 75.0:0 failed 1 times (most recent failure: Exception failure: java.lang.IllegalStateException: unread block data) Then I tried collecting the RDD contents in an array, and writing the array to the file manually. Again, that fails. It is giving me empty arrays, even when data is there. /**The below saves the data in multiple text files. So data is there for sure **/ rdd.saveAsTextFile(resultDirectory) /**The below simply prints size 0 for all the RDDs in a stream. Why ?! **/ val arr = rdd.collect println("SIZE of RDD " + rdd.id + " " + arr.size) Kindly help! I am clueless on how to proceed. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/RDD-Collect-returns-empty-arrays-tp3242.html Sent from the Apache Spark User List mailing list archive at Nabble.com.