[GitHub] spark pull request #18031: Record accurate size of blocks in MapStatus when ...

2017-05-18 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18031#discussion_r117385321 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -121,48 +126,69 @@ private[spark] class CompressedMapStatus( }

[GitHub] spark pull request #18031: Record accurate size of blocks in MapStatus when ...

2017-05-18 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/18031#discussion_r117293425 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -193,8 +219,27 @@ private[spark] object HighlyCompressedMapStatus {

[GitHub] spark pull request #18031: Record accurate size of blocks in MapStatus when ...

2017-05-18 Thread jinxing64
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/18031 Record accurate size of blocks in MapStatus when it's above threshold. ## What changes were proposed in this pull request? Currently, when number of reduces is above 2000, HighlyCompresse