I'm trying to flatten an RDD of RDDs. The straightforward approach: a: [RDD[RDD[Int]] a flatMap { _.collect }
throws a java.lang.NullPointerException at org.apache.spark.rdd.RDD.collect(RDD.scala:602) In a more complex scenario I also got: Task not serializable: java.io.NotSerializableException: org.apache.spark.SparkContext at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028) So I guess this may be related to the context not being available inside the map. Are nested RDDs not supported? Thanks, Cosmin Radoi