wangyum commented on a change in pull request #33172: URL: https://github.com/apache/spark/pull/33172#discussion_r662161772
########## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ########## @@ -525,18 +525,34 @@ object SQLConf { .booleanConf .createWithDefault(true) + private val MIN_PARTITION_NUM_KEY = "spark.sql.adaptive.coalescePartitions.minPartitionNum" + private val MIN_PARTITION_SIZE_KEY = "spark.sql.adaptive.coalescePartitions.minPartitionSize" + val COALESCE_PARTITIONS_MIN_PARTITION_NUM = - buildConf("spark.sql.adaptive.coalescePartitions.minPartitionNum") + buildConf(MIN_PARTITION_NUM_KEY) .doc("The suggested (not guaranteed) minimum number of shuffle partitions after " + "coalescing. If not set, the default value is the default parallelism of the " + - "Spark cluster. This configuration only has an effect when " + - s"'${ADAPTIVE_EXECUTION_ENABLED.key}' and " + + s"Spark cluster, w.r.t. the minimum partition size set by '$MIN_PARTITION_SIZE_KEY'. " + + s"This configuration only has an effect when '${ADAPTIVE_EXECUTION_ENABLED.key}' and " + s"'${COALESCE_PARTITIONS_ENABLED.key}' are both true.") .version("3.0.0") .intConf .checkValue(_ > 0, "The minimum number of partitions must be positive.") .createOptional + val COALESCE_PARTITIONS_MIN_PARTITION_SIZE = + buildConf(MIN_PARTITION_SIZE_KEY) + .doc(s"The minimum size of shuffle partitions after coalescing if '$MIN_PARTITION_NUM_KEY' " + + s"is not set. When '$MIN_PARTITION_NUM_KEY' is not set, the minimum number of " + + "shuffle partitions after coalescing will fall back to the default parallelism " + + "of the Spark cluster, sometimes causing partition sizes to be too small. This config " + + "aims to prevent this situation. Note that this config does not take effect if " + + s"'${COALESCE_PARTITIONS_MIN_PARTITION_NUM.key}' is set by the user.") + .version("3.2.0") + .bytesConf(ByteUnit.BYTE) + .checkValue(_ > 0, "The minimum partition size must be positive.") + .createWithDefaultString("1MB") Review comment: Yes. I also think 1MB is enough. In our cluster, there are many cases less than 1MB. ![image](https://user-images.githubusercontent.com/5399861/124106786-23c59400-da97-11eb-8cd6-ca24fda494a3.png) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org