hi! It is because of "spark.sql.shuffle.partitions". See the value 200 in the physical plan at the rangepartitioning:
scala> val df = sc.parallelize(1 to 1000, 10).toDF("v").sort("v") df: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [v: int] scala> df.explain() == Physical Plan == *(2) Sort [v#300 ASC NULLS FIRST], true, 0 +- Exchange rangepartitioning(v#300 ASC NULLS FIRST, 200), true, [id=#334] +- *(1) Project [value#297 AS v#300] +- *(1) SerializeFromObject [input[0, int, false] AS value#297] +- Scan[obj#296] scala> df.rdd.getNumPartitions res13: Int = 200 Best Regards, Attila -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org