guihuawen created SPARK-45894:
---------------------------------

             Summary: hive table level setting hadoop.mapred.max.split.size
                 Key: SPARK-45894
                 URL: https://issues.apache.org/jira/browse/SPARK-45894
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.5.0
            Reporter: guihuawen
             Fix For: 3.5.0


In the scenario of hive table scan, by configuring the 
hadoop.mapred.max.split.size parameter, you can increase the parallelism of the 
scan hive table stage, thereby reducing the running time.


However, if a large table and a small table are in the same query, if only a 
separate hadoop.mapred.max.split.size parameter is configured, some stages will 
run a very large number of tasks, and some stages will The number of tasks 
running is very small. For runtime tasks, the hadoop.mapred.max.split.size 
parameter can be set separately for each hive table to ensure this balance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to