Hi,
  I'm wondering whether spark rdd can has a partitionedByKey function? The
use of this function is to have a rdd distributed by according to a
cerntain paritioner and cache it. And then further join performance by rdd
with same partitoner will a great speed up. Currently, we only have a
groupByKeyFunction and generate a Seq of desired type , which is not very
convenient.

Btw, Sorry for last empty body email. I mistakenly hit the send shortcut.


Best Regards,
Jiacheng Guo

Reply via email to