Hi, I am creating a RDD from a text file by specifying number of partitions. But it gives me different number of partitions than the specified one.
*/scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 0) people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[72] at textFile at <console>:27 scala> people.getNumPartitions res47: Int = 1 scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 1) people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[50] at textFile at <console>:27 scala> people.getNumPartitions res36: Int = 1 scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 2) people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[52] at textFile at <console>:27 scala> people.getNumPartitions res37: Int = 2 scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 3) people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[54] at textFile at <console>:27 scala> people.getNumPartitions res38: Int = 3 scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 4) people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[56] at textFile at <console>:27 scala> people.getNumPartitions res39: Int = 4 scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 5) people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[58] at textFile at <console>:27 scala> people.getNumPartitions res40: Int = 6 scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 6) people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[60] at textFile at <console>:27 scala> people.getNumPartitions res41: Int = 7 scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 7) people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[62] at textFile at <console>:27 scala> people.getNumPartitions res42: Int = 8 scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 8) people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[64] at textFile at <console>:27 scala> people.getNumPartitions res43: Int = 9 scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 9) people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[66] at textFile at <console>:27 scala> people.getNumPartitions res44: Int = 11 scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 10) people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[68] at textFile at <console>:27 scala> people.getNumPartitions res45: Int = 11 scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 11) people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[70] at textFile at <console>:27 scala> people.getNumPartitions res46: Int = 13/* Contents of the file /home/pvikash/data/test.txt is: " This is a test file. Will be used for rdd partition " I am trying to understand why number of partitions is changing here and in case we have small data (which can fit into one partition) then why spark creates empty partitions? Any explanation would be appreciated. --Vikash -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Number-Of-Partitions-in-RDD-tp28730.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org