guihuawen created SPARK-45894: --------------------------------- Summary: hive table level setting hadoop.mapred.max.split.size Key: SPARK-45894 URL: https://issues.apache.org/jira/browse/SPARK-45894 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.5.0 Reporter: guihuawen Fix For: 3.5.0
In the scenario of hive table scan, by configuring the hadoop.mapred.max.split.size parameter, you can increase the parallelism of the scan hive table stage, thereby reducing the running time. However, if a large table and a small table are in the same query, if only a separate hadoop.mapred.max.split.size parameter is configured, some stages will run a very large number of tasks, and some stages will The number of tasks running is very small. For runtime tasks, the hadoop.mapred.max.split.size parameter can be set separately for each hive table to ensure this balance. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org