The RDD API has functions to join multiple RDDs, such as PariRDD.join or PariRDD.cogroup that take another RDD as input. e.g. firstRDD.join(secondRDD)
I'm looking for ways to do the opposite: split an existing RDD. What is the right way to create derivate RDDs from an existing RDD? e.g. imagine I've an collection or pairs as input: colRDD = (k1->v1)...(kx->vy)... I could do: val byKey = colRDD.groupByKey() = (k1->(k1->v1... k1->vn)),...(kn->(kn->vy, ...)) Now, I'd like to create an RDD from the values to have something like: val groupedRDDs = (k1->RDD(k1->v1,...k1->vn), kn -> RDD(kn->vy, ...)) in this example, there's an f(byKey) = groupedRDDs. What's that f(x) ? Would: byKey.map{case (k,v) => k->sc.makeRDD(v.toSeq)} the right/recommended way to do this? Any other options? Thanks, Gerard.