[ 
https://issues.apache.org/jira/browse/SPARK-42779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anton Okolnychyi updated SPARK-42779:
-------------------------------------
    Description: Data sources may request a particular distribution and 
ordering of data for V2 writes. If AQE is enabled, the default session advisory 
partition size (64MB) will be used as guidance. Unfortunately, this default 
value can still lead to small files because the written data can be compressed 
nicely using columnar file formats. Spark should allow data sources to indicate 
the advisory shuffle partition size, just like it lets data sources request a 
particular number of partitions.  (was: Data sources may request a particular 
distribution and ordering of data during V2 writes. If AQE is enabled, the 
default session advisory partition size (64MB) will be used as guidance. 
Unfortunately, this default value can still lead to small files because the 
written data can be compressed nicely using columnar file formats. Spark should 
allow data sources to indicate the advisory shuffle partition size, just like 
it lets data sources request a particular number of partitions.)

> Allow V2 writes to indicate advisory partition size
> ---------------------------------------------------
>
>                 Key: SPARK-42779
>                 URL: https://issues.apache.org/jira/browse/SPARK-42779
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.5.0
>            Reporter: Anton Okolnychyi
>            Priority: Major
>
> Data sources may request a particular distribution and ordering of data for 
> V2 writes. If AQE is enabled, the default session advisory partition size 
> (64MB) will be used as guidance. Unfortunately, this default value can still 
> lead to small files because the written data can be compressed nicely using 
> columnar file formats. Spark should allow data sources to indicate the 
> advisory shuffle partition size, just like it lets data sources request a 
> particular number of partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to