Hi Xiangrui,
I'm sorry. I didn't recognize your mail.
What I did is a workaround only working for my special case.
It does not scale and only works for small data sets but that is fine
for me so far.
Kind Regards,
Niklas
def securlyZipRdds[A, B: ClassTag](rdd1: RDD[A], rdd2: RDD[B]):
RDD[(A, B
"sample 2 * n tuples, split them into two parts, balance the sizes of
these parts by filtering some tuples out"
How do you guarantee that the two RDDs have the same size?
-Xiangrui
On Fri, Jan 9, 2015 at 3:40 AM, Niklas Wilcke
<1wil...@informatik.uni-hamburg.de> wrote:
> Hi Spark community,
>
>