Pairwise computations within partition

abellet Thu, 09 Apr 2015 06:56:30 -0700

Hello everyone,

I am a Spark novice facing a nontrivial problem to solve with Spark.


I have an RDD consisting of many elements (say, 60K), where each element is
is a d-dimensional vector.

I want to implement an iterative algorithm which does the following. At each
iteration, I want to apply an operation on *pairs* of elements (say, compute
their dot product). Of course the number of pairs is huge, but I only need
to consider a small random subset of the possible pairs at each iteration.

To minimize communication between nodes, I am willing to partition my RDD by
key (where each elements gets a random key) and to only consider pairs of
elements that belong to the same partition (i.e., that share the same key).
But I am not sure how to sample and apply the operation on pairs, and to
make sure that the computation for each pair is indeed done by the node
holding the corresponding elements.

Any help would be greatly appreciated. Thanks a lot!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Pairwise-computations-within-partition-tp22436.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Pairwise computations within partition

Reply via email to