Partitioning of Dataframes

Karlson Fri, 22 May 2015 02:04:23 -0700

Hi,

is there any way to control how Dataframes are partitioned? I'm doinglots of joins and am seeing very large shuffle reads and writes in theSpark UI. With PairRDDs you can control how the data is partitionedacross nodes with partitionBy. There is no such method on Dataframeshowever. Can I somehow partition the underlying the RDD manually? I amcurrently using the Python API.


Thanks!

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Partitioning of Dataframes

Reply via email to