imho, you need to take into account size of your data too
if your cluster is relatively small, you may cause memory pressure on your
executors if trying to repartition to some #cores connected number of
partitions

better to take some max between initial number of partitions(assuming your
data is on hdfs with 64Mb block size) and between number you get from your
formula



On 29 July 2015 at 12:31, ponkin <alexey.pon...@ya.ru> wrote:

> Hi Rahul,
>
> Where did you see such a recommendation?
> I personally define partitions with the following formula
>
> partitions = nextPrimeNumberAbove( K*(--num-executors * --executor-cores )
> )
>
> where
> nextPrimeNumberAbove(x) - prime number which is greater than x
> K - multiplicator  to calculate start with 1 and encrease untill join
> perfomance start to degrade
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Number-of-Partitions-Recommendations-tp24022p24059.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to