What's the logic in RangePartitioner.rangeBounds method of Apache Spark

ihainan Mon, 17 Aug 2015 09:19:57 -0700

*Firstly so sorry for my poor English.*

I was reading the source code of Apache Spark 1.4.1 and I really got stuck
at the logic of RangePartitioner.rangeBounds method. The code is shown
below.




So can anyone please explain me that:

1. What is "3.0 *" for in the code line of "val sampleSizePerPartition =
math.ceil(3.0 * sampleSize / rdd.partitions.size).toInt"? Why choose 3.0
rather than other values?

2. Why "fraction * n > sampleSizePerPartition" means that a partition
contains much more than the average number of items. Can you give an example
that we need to re-sample the partition?

Thanks a lot!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/What-s-the-logic-in-RangePartitioner-rangeBounds-method-of-Apache-Spark-tp24296.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

What's the logic in RangePartitioner.rangeBounds method of Apache Spark

Reply via email to