[ https://issues.apache.org/jira/browse/SPARK-31314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan resolved SPARK-31314. --------------------------------- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 28072 [https://github.com/apache/spark/pull/28072] > Revert SPARK-29285 to fix shuffle regression caused by creating temporary > file eagerly > -------------------------------------------------------------------------------------- > > Key: SPARK-31314 > URL: https://issues.apache.org/jira/browse/SPARK-31314 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 3.0.0 > Reporter: Yuanjian Li > Assignee: Yuanjian Li > Priority: Major > Fix For: 3.0.0 > > > In SPARK-29285, we change to create shuffle temporary eagerly. This is > helpful for not to fail the entire task in the scenario of occasional disk > failure. > But for the applications that many tasks don't actually create shuffle files, > it caused overhead. See the below benchmark: > Env: Spark local-cluster[2, 4, 19968], each queries run 5 round, each round 5 > times. > Data: TPC-DS scale=99 generate by spark-tpcds-datagen > Results: > || ||Base||Revert|| > |Q20|Vector(4.096865667, 2.76231748, 2.722007606, 2.514433591, 2.400373579) > Median 2.722007606|Vector(3.763185446, 2.586498463, 2.593472842, 2.320522846, > 2.224627274) Median 2.586498463| > |Q33|Vector(5.872176321, 4.854397586, 4.568787136, 4.393378146, 4.423996818) > Median 4.568787136|Vector(5.38746785, 4.361236877, 4.082311276, 3.867206824, > 3.783188024) Median 4.082311276| > |Q52|Vector(3.978870321, 3.225437871, 3.282411608, 2.869674887, 2.644490664) > Median 3.225437871|Vector(4.000381522, 3.196025108, 3.248787619, 2.767444508, > 2.606163423) Median 3.196025108| > |Q56|Vector(6.238045133, 4.820535173, 4.609965579, 4.313509894, 4.221256227) > Median 4.609965579|Vector(6.241611339, 4.225592467, 4.195202502, 3.757085755, > 3.657525982) Median 4.195202502| -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org