Ye Zhou created SPARK-37023:
-------------------------------

             Summary: Avoid fetching merge status when shuffleMergeEnabled is 
false for a shuffleDependency during retry
                 Key: SPARK-37023
                 URL: https://issues.apache.org/jira/browse/SPARK-37023
             Project: Spark
          Issue Type: Sub-task
          Components: Shuffle
    Affects Versions: 3.2.0
            Reporter: Ye Zhou


The assertion below inĀ MapOutoutputTracker.getMapSizesByExecutorId is not 
guaranteed
{code:java}
assert(mapSizesByExecutorId.enableBatchFetch == true){code}
The reason is during some stage retry cases, the 
shuffleDependency.shuffleMergeEnabled is set to false, but there will be 
mergeStatus since the Driver has collected the merged status for its shuffle 
dependency. If this is the case, the current implementation would set the 
enableBatchFetch to false, since there are mergeStatus.

Details can be found here:

[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/MapOutputTracker.scala#L1492]

We should improve the implementation here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to