*Firstly so sorry for my poor English.* I was reading the source code of Apache Spark 1.4.1 and I really got stuck at the logic of RangePartitioner.rangeBounds method. The code is shown below.
So can anyone please explain me that: 1. What is "3.0 *" for in the code line of "val sampleSizePerPartition = math.ceil(3.0 * sampleSize / rdd.partitions.size).toInt"? Why choose 3.0 rather than other values? 2. Why "fraction * n > sampleSizePerPartition" means that a partition contains much more than the average number of items. Can you give an example that we need to re-sample the partition? Thanks a lot! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/What-s-the-logic-in-RangePartitioner-rangeBounds-method-of-Apache-Spark-tp24296.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org