Re: Zipping RDDs of equal size not possible

2015-02-05 Thread Niklas Wilcke
Hi Xiangrui, I'm sorry. I didn't recognize your mail. What I did is a workaround only working for my special case. It does not scale and only works for small data sets but that is fine for me so far. Kind Regards, Niklas def securlyZipRdds[A, B: ClassTag](rdd1: RDD[A], rdd2: RDD[B]): RDD[(A, B

Re: Zipping RDDs of equal size not possible

2015-01-09 Thread Xiangrui Meng
"sample 2 * n tuples, split them into two parts, balance the sizes of these parts by filtering some tuples out" How do you guarantee that the two RDDs have the same size? -Xiangrui On Fri, Jan 9, 2015 at 3:40 AM, Niklas Wilcke <1wil...@informatik.uni-hamburg.de> wrote: > Hi Spark community, > >