Hi, I'm wondering whether spark rdd can has a partitionedByKey function? The use of this function is to have a rdd distributed by according to a cerntain paritioner and cache it. And then further join performance by rdd with same partitoner will a great speed up. Currently, we only have a groupByKeyFunction and generate a Seq of desired type , which is not very convenient.
Btw, Sorry for last empty body email. I mistakenly hit the send shortcut. Best Regards, Jiacheng Guo