Nope, nested RDDs aren't supported: https://groups.google.com/d/msg/spark-users/_Efj40upvx4/DbHCixW7W7kJ https://groups.google.com/d/msg/spark-users/KC1UJEmUeg8/N_qkTJ3nnxMJ https://groups.google.com/d/msg/spark-users/rkVPXAiCiBk/CORV5jyeZpAJ
On Sun, Mar 2, 2014 at 5:37 PM, Cosmin Radoi <cosmin.ra...@gmail.com> wrote: > > I'm trying to flatten an RDD of RDDs. The straightforward approach: > > a: [RDD[RDD[Int]] > a flatMap { _.collect } > > throws a java.lang.NullPointerException at > org.apache.spark.rdd.RDD.collect(RDD.scala:602) > > In a more complex scenario I also got: > Task not serializable: java.io.NotSerializableException: > org.apache.spark.SparkContext > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028) > > So I guess this may be related to the context not being available inside > the map. > > Are nested RDDs not supported? > > Thanks, > > Cosmin Radoi > >