Thanks a lot for the suggestions!
Le 18/06/2015 15:02, Himanshu Mehra [via Apache Spark User List] a écrit :
Hi A bellet
You can try RDD.randomSplit(weights array) where a weights array is the
array of weight you wants to want to put in the consecutive partition
example
Hello,
In the context of a machine learning algorithm, I need to be able to
randomly distribute the elements of a large RDD across partitions (i.e.,
essentially assign each element to a random partition). How could I achieve
this? I have tried to call repartition() with the current number of
Hi everyone,
I have a large RDD and I am trying to create a RDD of a random sample of
pairs of elements from this RDD. The elements composing a pair should come
from the same partition for efficiency. The idea I've come up with is to
take two random samples and then use zipPartitions to pair each
Hello everyone,
I am a Spark novice facing a nontrivial problem to solve with Spark.
I have an RDD consisting of many elements (say, 60K), where each element is
is a d-dimensional vector.
I want to implement an iterative algorithm which does the following. At each
iteration, I want to apply an