What are you trying to accomplish? Internally Spark SQL will add Exchange operators to make sure that data is partitioned correctly for joins and aggregations. If you are going to do other RDD operations on the result of dataframe operations and you need to manually control the partitioning, call df.rdd and partition as you normally would.
On Fri, May 8, 2015 at 2:47 PM, Daniel, Ronald (ELS-SDG) < r.dan...@elsevier.com> wrote: > Hi, > > How can I ensure that a batch of DataFrames I make are all partitioned > based on the value of one column common to them all? > For RDDs I would partitionBy a HashPartitioner, but I don't see that in > the DataFrame API. > If I partition the RDDs that way, then do a toDF(), will the partitioning > be preserved? > > Thanks, > Ron > >