If you are using textFiles() to read data in, it also takes in a parameter
the number of minimum partitions to create. Would that not work for you?
On Oct 2, 2014 7:00 AM, "jamborta" <jambo...@gmail.com> wrote:

> Hi all,
>
> I have been testing repartitioning to ensure that my algorithms get similar
> amount of data.
>
> Noticed that repartitioning is very expensive. Is there a way to force
> Spark
> to create a certain number of partitions when the data is read in? How does
> it decided on the partition size initially?
>
> Thanks,
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/partition-size-for-initial-read-tp15603.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to