Re: Best way to randomly distribute elements

2015-06-19 Thread abellet
Thanks a lot for the suggestions! Le 18/06/2015 15:02, Himanshu Mehra [via Apache Spark User List] a écrit : Hi A bellet You can try RDD.randomSplit(weights array) where a weights array is the array of weight you wants to want to put in the consecutive partition example

Best way to randomly distribute elements

2015-06-18 Thread abellet
Hello, In the context of a machine learning algorithm, I need to be able to randomly distribute the elements of a large RDD across partitions (i.e., essentially assign each element to a random partition). How could I achieve this? I have tried to call repartition() with the current number of

Random pairs / RDD order

2015-04-16 Thread abellet
Hi everyone, I have a large RDD and I am trying to create a RDD of a random sample of pairs of elements from this RDD. The elements composing a pair should come from the same partition for efficiency. The idea I've come up with is to take two random samples and then use zipPartitions to pair each

Pairwise computations within partition

2015-04-09 Thread abellet
Hello everyone, I am a Spark novice facing a nontrivial problem to solve with Spark. I have an RDD consisting of many elements (say, 60K), where each element is is a d-dimensional vector. I want to implement an iterative algorithm which does the following. At each iteration, I want to apply an