This comes up so often. I wonder if the documentation or the API could be
changed to answer this question.
The solution I found is from
http://stackoverflow.com/questions/23995040/write-to-multiple-outputs-by-key-spark-one-spark-job.
You basically write the items into two directories in a single
Hi,
I've a RDD which I want to split into two disjoint RDDs on with a boolean
function. I can do this with the following
val rdd1 = rdd.filter(f)
val rdd2 = rdd.filter(fnot)
I'm assuming that each of the above statement will traverse the RDD once
thus resulting in 2 passes.
Is there a way of