The RDD API has  functions to join multiple RDDs, such as PariRDD.join
or PariRDD.cogroup that take another RDD as input. e.g.
 firstRDD.join(secondRDD)

I'm looking for ways to do the opposite: split an existing RDD. What is the
right way to create derivate RDDs from an existing RDD?

e.g. imagine I've an  collection or pairs as input: colRDD =
 (k1->v1)...(kx->vy)...
I could do:
val byKey = colRDD.groupByKey() = (k1->(k1->v1... k1->vn)),...(kn->(kn->vy,
...))

Now, I'd like to create an RDD from the values to have something like:

val groupedRDDs = (k1->RDD(k1->v1,...k1->vn), kn -> RDD(kn->vy, ...))

in this example, there's an f(byKey) = groupedRDDs.  What's that f(x) ?

Would:  byKey.map{case (k,v) => k->sc.makeRDD(v.toSeq)}  the
right/recommended way to do this?  Any other options?

Thanks,

Gerard.

Reply via email to