RE: defaultMinPartitions in textFile

2014-07-21 Thread Wang, Jensen
Yes, Great. I thought it was math.max instead of math.min on that line. Thank you! From: Ye Xianjin [mailto:advance...@gmail.com] Sent: Tuesday, July 22, 2014 11:37 AM To: user@spark.apache.org Subject: Re: defaultMinPartitions in textFile well, I think you miss this line of code in

Re: defaultMinPartitions in textFile

2014-07-21 Thread Ye Xianjin
well, I think you miss this line of code in SparkContext.scala line 1242-1243(master): /** Default min number of partitions for Hadoop RDDs when not given by user */ def defaultMinPartitions: Int = math.min(defaultParallelism, 2) so the defaultMinPartitions will be 2 unless the defaultParalleli

defaultMinPartitions in textFile

2014-07-21 Thread Wang, Jensen
Hi, I started to use spark on yarn recently and found a problem while tuning my program. When SparkContext is initialized as sc and ready to read text file from hdfs, the textFile(path, defaultMinPartitions) method is called. I traced down the second parameter in the spark source code an