Preventing an RDD from shuffling

sparkuser2345 Wed, 16 Dec 2015 02:24:35 -0800

Is there a way to prevent an RDD from shuffling in a join operation without
repartitioning it?


I'm reading an RDD from sharded MongoDB, joining that with an RDD of
incoming data (+ some additional calculations), and writing the resulting
RDD back to MongoDB. It would make sense to shuffle only the incoming data
RDD so that the joined RDD would already be partitioned correctly according
to the MondoDB shard key. 

I know I can prevent an RDD from shuffling in a join operation by
partitioning it beforehand but partitioning would already shuffle the RDD.
In addition, I'm only doing the join once per RDD read from MongoDB. Is
there a way to tell Spark to shuffle only the incoming data RDD?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Preventing-an-RDD-from-shuffling-tp25717.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Preventing an RDD from shuffling

Reply via email to