If you are using textFiles() to read data in, it also takes in a parameter the number of minimum partitions to create. Would that not work for you? On Oct 2, 2014 7:00 AM, "jamborta" <jambo...@gmail.com> wrote:
> Hi all, > > I have been testing repartitioning to ensure that my algorithms get similar > amount of data. > > Noticed that repartitioning is very expensive. Is there a way to force > Spark > to create a certain number of partitions when the data is read in? How does > it decided on the partition size initially? > > Thanks, > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/partition-size-for-initial-read-tp15603.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >