[ 
https://issues.apache.org/jira/browse/SPARK-31314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-31314.
---------------------------------
    Fix Version/s: 3.0.0
       Resolution: Fixed

Issue resolved by pull request 28072
[https://github.com/apache/spark/pull/28072]

> Revert SPARK-29285 to fix shuffle regression caused by creating temporary 
> file eagerly
> --------------------------------------------------------------------------------------
>
>                 Key: SPARK-31314
>                 URL: https://issues.apache.org/jira/browse/SPARK-31314
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.0.0
>            Reporter: Yuanjian Li
>            Assignee: Yuanjian Li
>            Priority: Major
>             Fix For: 3.0.0
>
>
> In SPARK-29285, we change to create shuffle temporary eagerly. This is 
> helpful for not to fail the entire task in the scenario of occasional disk 
> failure.
> But for the applications that many tasks don't actually create shuffle files, 
> it caused overhead. See the below benchmark:
> Env: Spark local-cluster[2, 4, 19968], each queries run 5 round, each round 5 
> times.
> Data: TPC-DS scale=99 generate by spark-tpcds-datagen
> Results:
> || ||Base||Revert||
> |Q20|Vector(4.096865667, 2.76231748, 2.722007606, 2.514433591, 2.400373579) 
> Median 2.722007606|Vector(3.763185446, 2.586498463, 2.593472842, 2.320522846, 
> 2.224627274) Median 2.586498463|
> |Q33|Vector(5.872176321, 4.854397586, 4.568787136, 4.393378146, 4.423996818) 
> Median 4.568787136|Vector(5.38746785, 4.361236877, 4.082311276, 3.867206824, 
> 3.783188024) Median 4.082311276|
> |Q52|Vector(3.978870321, 3.225437871, 3.282411608, 2.869674887, 2.644490664) 
> Median 3.225437871|Vector(4.000381522, 3.196025108, 3.248787619, 2.767444508, 
> 2.606163423) Median 3.196025108|
> |Q56|Vector(6.238045133, 4.820535173, 4.609965579, 4.313509894, 4.221256227) 
> Median 4.609965579|Vector(6.241611339, 4.225592467, 4.195202502, 3.757085755, 
> 3.657525982) Median 4.195202502|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to