Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/18031#discussion_r117385321
--- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala ---
@@ -121,48 +126,69 @@ private[spark] class CompressedMapStatus(
}
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18031#discussion_r117293425
--- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala ---
@@ -193,8 +219,27 @@ private[spark] object HighlyCompressedMapStatus {
GitHub user jinxing64 opened a pull request:
https://github.com/apache/spark/pull/18031
Record accurate size of blocks in MapStatus when it's above threshold.
## What changes were proposed in this pull request?
Currently, when number of reduces is above 2000, HighlyCompresse