[GitHub] spark issue #18231: [WIP][SPARK-20994] Remove reduant characters in OpenBloc...

2017-06-08 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18231 @vanzin Thanks a lot for reviewing this. I refined according to your comments, Please take another look at this when you have time :) --- If your project is set up for it, you can reply

[GitHub] spark pull request #18231: [WIP][SPARK-20994] Remove reduant characters in O...

2017-06-08 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/18231#discussion_r120809215 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java --- @@ -209,4 +190,47 @@ private

[GitHub] spark pull request #18231: [WIP][SPARK-20994] Remove reduant characters in O...

2017-06-08 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/18231#discussion_r120808962 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java --- @@ -209,4 +190,47 @@ private

[GitHub] spark issue #18231: [WIP][SPARK-20994] Remove reduant characters in OpenBloc...

2017-06-08 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18231 @srowen Thanks a lot looking into this :) For example: blockId="shuffle_20_1000_2000", it is stored as an `String`, which costs more than 20 bytes. In this change, it will c

[GitHub] spark pull request #18231: [WIP][SPARK-20994] Remove reduant characters in O...

2017-06-07 Thread jinxing64
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/18231 [WIP][SPARK-20994] Remove reduant characters in OpenBlocks to save memory for shuffle service. ## What changes were proposed in this pull request? In current code, blockIds

[GitHub] spark issue #18231: [WIP][SPARK-20994] Remove reduant characters in OpenBloc...

2017-06-07 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18231 n my cluster, we are suffering from OOM of shuffle-service. We found that a lot of executors are fetching blocks from a single shuffle-service. Analyzing the memory, we found

[GitHub] spark pull request #18211: [WIP][SPARK-20994] Alleviate memory pressure in S...

2017-06-07 Thread jinxing64
Github user jinxing64 closed the pull request at: https://github.com/apache/spark/pull/18211 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #18211: [WIP][SPARK-20994] Alleviate memory pressure in StreamMa...

2017-06-07 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18211 @vanzin Thanks a lot for comment. I will close this pr and think if there is other solution. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #18211: [SPARK-20994] Alleviate memory pressure in StreamManager

2017-06-06 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18211 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #18211: [WIP][SPARK-20994] Alleviate memory pressure in S...

2017-06-06 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/18211#discussion_r120388659 --- Diff: common/network-common/src/test/java/org/apache/spark/network/server/OneForOneStreamManagerSuite.java --- @@ -1,50 +0,0

[GitHub] spark issue #18204: [SPARK-20985] Stop SparkContext using LocalSparkContext....

2017-06-06 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18204 Thanks for merging --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #18211: [WIP][SPARK-20994] Alleviate memory pressure in StreamMa...

2017-06-06 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18211 In this pr: 1. Instead of `chunkIndex`, fetch chunk by `String chunkId`. Server doesn't cache the blocks list. 2. In `OpenBlocks`, only metadata(e.g. appId, executorId) of the stream

[GitHub] spark issue #18211: [WIP][SPARK-20994] Alleviate memory pressure in StreamMa...

2017-06-06 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18211 In my cluster, we are suffering from OOM of shuffle-service. We found that a lot of executors are fetching blocks from a single shuffle-service. Analyzing the memory, we found

[GitHub] spark pull request #18211: [WIP][SPARK-20994] Alleviate memory pressure in S...

2017-06-06 Thread jinxing64
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/18211 [WIP][SPARK-20994] Alleviate memory pressure in StreamManager ## What changes were proposed in this pull request? In current code, chunks are fetched from shuffle service in two steps

[GitHub] spark issue #18204: [SPARK-20985] Stop SparkContext using LocalSparkContext....

2017-06-05 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18204 @srowen Thanks for approving ! :-) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #18204: [SPARK-20985] sc.stop should be encapsulated in f...

2017-06-05 Thread jinxing64
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/18204 [SPARK-20985] sc.stop should be encapsulated in finally ## What changes were proposed in this pull request? Stop `SparkContext` in `finally`, thus other tests won't complain that there's

[GitHub] spark pull request #17533: [SPARK-20219] Schedule tasks based on size of inp...

2017-06-02 Thread jinxing64
Github user jinxing64 closed the pull request at: https://github.com/apache/spark/pull/17533 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #17533: [SPARK-20219] Schedule tasks based on size of input from...

2017-06-02 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17533 @HyukjinKwon Sorry, I will close this for now and make another pr if there's progress. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #17603: [SPARK-20288] Avoid generating the MapStatus by stageId ...

2017-05-30 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17603 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #17603: [SPARK-20288] Avoid generating the MapStatus by stageId ...

2017-05-30 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17603 @squito Thank you so much :-) :-) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17312: [SPARK-19973] Display num of executors for the stage.

2017-05-30 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17312 Sorry, I will close it for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #17312: [SPARK-19973] Display num of executors for the st...

2017-05-30 Thread jinxing64
Github user jinxing64 closed the pull request at: https://github.com/apache/spark/pull/17312 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #17634: [SPARK-20333] HashPartitioner should be compatible with ...

2017-05-30 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17634 @squito Thanks a lot for merging :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18117: [SPARK-19659][CORE][FOLLOW-UP] Fetch big blocks to disk ...

2017-05-26 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18117 @cloud-fan Thanks a lot for notification. I think it's really good change here 👍 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request #18117: [SPARK-19659][CORE][FOLLOW-UP] Fetch big blocks t...

2017-05-26 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/18117#discussion_r118731980 --- Diff: core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala --- @@ -443,34 +445,34 @@ class

[GitHub] spark pull request #18117: [SPARK-19659][CORE][FOLLOW-UP] Fetch big blocks t...

2017-05-26 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/18117#discussion_r118731891 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -214,11 +214,12 @@ final class

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-25 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 @cloud-fan @JoshRosen @mridulm @squito @viirya Thanks a lot for taking so much time reviewing this patch ! Sorry for the stupid mistakes I made. I will be more careful next time

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-25 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r118482160 --- Diff: core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala --- @@ -401,4 +413,64 @@ class

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-25 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r118424375 --- Diff: core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala --- @@ -401,4 +411,61 @@ class

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-25 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r118424377 --- Diff: docs/configuration.md --- @@ -520,6 +520,14 @@ Apart from these, the following properties are also available, and may be useful

[GitHub] spark issue #17634: [SPARK-20333] HashPartitioner should be compatible with ...

2017-05-24 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17634 Yes. it doesn't fail for sure. I just think it's fairly straightforward that partitioner should be compatible with num of child RDD's partitions. I find no reason the num of partitions

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-24 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r118272761 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -175,33 +187,49 @@ final class

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-24 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r118272605 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -163,6 +170,11 @@ final class

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-24 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r118272653 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -175,33 +187,49 @@ final class

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-24 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r118272532 --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala --- @@ -287,4 +287,10 @@ package object config { .bytesConf

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-23 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 In current change: 1) remove the partial written file when failing 2) remove all shuffle files when `cleanup()`(this is registered as a task completion callback) --- If your project

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-23 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 @cloud-fan In current change, the shuffle files are deleted twice: 1). After the `ManagedBuffer.release` 2). In the `cleanup()`, the `cleanup()` is already registered as a task

[GitHub] spark issue #17634: [SPARK-20333] HashPartitioner should be compatible with ...

2017-05-23 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17634 @jiangxb1987 Thank you so much taking time looking into this. Yes, it is not failing in existing code. But I think it's quite straightforward that partitioner should be compatible

[GitHub] spark issue #16989: [WIP][SPARK-19659] Fetch big blocks to disk when shuffle...

2017-05-23 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 In current change: 1. `ShuffleBlockFetcherIterator` is not a `MemoryConsumer` 2. Name of shuffle file becomes: ${context.taskAttemptId()}-remote-$bId 3. Try to delete all shuffle files

[GitHub] spark issue #18031: [SPARK-20801] Record accurate size of blocks in MapStatu...

2017-05-22 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18031 @cloud-fan Thanks for merging ! @mridulm @JoshRosen @viirya @HyukjinKwon @wzhfy Thanks a lot for taking time reviewing this pr ! --- If your project is set up for it, you can reply

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-22 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 @cloud-fan Yes, thanks a lot for merging #18031 I will update soon ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #18031: [SPARK-20801] Record accurate size of blocks in MapStatu...

2017-05-22 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18031 In current change: 1. there's only one config: spark.shuffle.accurateBlockThreshold 2. I remove the huge blocks from the numerator in that calculation for average size --- If your

[GitHub] spark pull request #18031: [SPARK-20801] Record accurate size of blocks in M...

2017-05-20 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/18031#discussion_r117620188 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -193,8 +219,27 @@ private[spark] object HighlyCompressedMapStatus

[GitHub] spark pull request #18031: [SPARK-20801] Record accurate size of blocks in M...

2017-05-20 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/18031#discussion_r117610423 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -121,48 +126,69 @@ private[spark] class CompressedMapStatus

[GitHub] spark pull request #18031: [SPARK-20801] Record accurate size of blocks in M...

2017-05-20 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/18031#discussion_r117610285 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -193,8 +219,27 @@ private[spark] object HighlyCompressedMapStatus

[GitHub] spark pull request #18031: [SPARK-20801] Record accurate size of blocks in M...

2017-05-19 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/18031#discussion_r117510051 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -193,8 +219,27 @@ private[spark] object HighlyCompressedMapStatus

[GitHub] spark pull request #18031: [SPARK-20801] Record accurate size of blocks in M...

2017-05-19 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/18031#discussion_r117461089 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -193,8 +219,27 @@ private[spark] object HighlyCompressedMapStatus

[GitHub] spark pull request #18031: [SPARK-20801] Record accurate size of blocks in M...

2017-05-19 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/18031#discussion_r117460170 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -193,8 +219,27 @@ private[spark] object HighlyCompressedMapStatus

[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...

2017-05-18 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18031 @HyukjinKwon Thank you so much ! Really helpful 👍 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...

2017-05-18 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18031 Gentle ping to @JoshRosen @cloud-fan @mridulm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...

2017-05-18 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18031 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #18031: Record accurate size of blocks in MapStatus when ...

2017-05-18 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/18031#discussion_r117293425 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -193,8 +219,27 @@ private[spark] object HighlyCompressedMapStatus

[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...

2017-05-18 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18031 I try to give user a way to control the memory strictly and no blocks are underestimated(setting spark.shuffle.accurateBlockThreshold=0 and spark.shuffle.accurateBlockThresholdByTimesAverage=1

[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...

2017-05-18 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18031 To resolve the comments in https://github.com/apache/spark/pull/16989 : >minimum size before we consider something a large block : if average is 10kb, and some blocks are > 20kb, spillin

[GitHub] spark pull request #18031: Record accurate size of blocks in MapStatus when ...

2017-05-18 Thread jinxing64
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/18031 Record accurate size of blocks in MapStatus when it's above threshold. ## What changes were proposed in this pull request? Currently, when number of reduces is above 2000

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-18 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 @JoshRosen Thanks a lot for taking time looking into this pr. I'm reading your comments carefully. Yes, I think it's good to integrate with memory manager later. I will break

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-17 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 Checking the code: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/internal/config/ConfigProvider.scala#L59 `SparkConfigProvider` just check if the key

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-17 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 It seems like `SparkConfigProvider` is not checking alternatives in `SparkConf`. That's why spark.memory.offHeap.enabled is not set(still the default value), though we've already set

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-17 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r117152091 --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala --- @@ -278,4 +278,39 @@ package object config

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-17 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-17 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 @cloud-fan Thanks, I will refine the documents. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-17 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r116929126 --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala --- @@ -278,4 +278,39 @@ package object config

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-17 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r116929076 --- Diff: core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala --- @@ -401,4 +429,146 @@ class

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-17 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r116919856 --- Diff: core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala --- @@ -401,4 +429,146 @@ class

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-16 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r116911964 --- Diff: core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala --- @@ -401,4 +429,146 @@ class

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-16 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r116911537 --- Diff: core/src/main/scala/org/apache/spark/shuffle/BlockStoreShuffleReader.scala --- @@ -51,7 +59,10 @@ private[spark] class BlockStoreShuffleReader

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-16 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r116907421 --- Diff: core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala --- @@ -401,4 +429,139 @@ class

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-16 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r116907437 --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala --- @@ -278,4 +278,10 @@ package object config

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-16 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r116907401 --- Diff: docs/configuration.md --- @@ -954,16 +970,16 @@ Apart from these, the following properties are also available, and may be useful

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-16 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r116907199 --- Diff: core/src/test/scala/org/apache/spark/scheduler/MapStatusSuite.scala --- @@ -128,4 +138,27 @@ class MapStatusSuite extends SparkFunSuite

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-16 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 @cloud-fan Thanks a lot. I will refine :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-16 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r116674324 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -193,8 +216,21 @@ private[spark] object HighlyCompressedMapStatus

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-15 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 In current code, `spark.memory.offHeap.enabled` is used when decide `tungstenMemoryMode`. `spark.memory.offHeap.enabled` doesn't decide remote blocks are shuffled to whether onHeap or offHeap

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-15 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r116521474 --- Diff: core/src/main/scala/org/apache/spark/shuffle/BlockStoreShuffleReader.scala --- @@ -51,7 +59,10 @@ private[spark] class BlockStoreShuffleReader

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-15 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r116520507 --- Diff: core/src/main/scala/org/apache/spark/memory/MemoryManager.scala --- @@ -54,7 +54,8 @@ private[spark] abstract class MemoryManager

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-15 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 Very gentle ping to @cloud-fan and @mridulm How do you think about the current change :) ? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-11 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 @cloud-fan @mridulm I think it's good idea to make 2000 configurable. But checking the code, I'm a little bit hesitant to do that in this pr. I think it's bigger change and some related code

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-11 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115933206 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -175,33 +193,54 @@ final class

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-11 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 @cloud-fan Yes, I think it's a good idea to make `2000` configurable. I will refine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-10 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 As @mridulm mentioned, in `HighlyCompressedMapStatus` it can be configured in two respects: >1. minimum size before we consider something a large block. >2. The fraction '2' shoul

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-10 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115884613 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/OneForOneBlockFetcher.java --- @@ -126,4 +150,50 @@ private void

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-09 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 Yes, I will refine :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-09 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115638943 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -128,41 +131,60 @@ private[spark] class CompressedMapStatus

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-09 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 @mridulm Really thankful for taking time looking into this pr. Really helpful. I refined according to your comments. Please take another look when you have time and give more comments

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-09 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115525769 --- Diff: common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java --- @@ -36,7 +36,7

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-09 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115525399 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -128,41 +131,60 @@ private[spark] class CompressedMapStatus

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-09 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115523971 --- Diff: common/network-common/src/main/java/org/apache/spark/network/server/OneForOneStreamManager.java --- @@ -95,6 +95,14 @@ public ManagedBuffer

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-09 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115523324 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/OneForOneBlockFetcher.java --- @@ -126,4 +149,38 @@ private void

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-09 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115523103 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -154,15 +164,24 @@ final class

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-09 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115522973 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -154,15 +164,24 @@ final class

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-09 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115522902 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -137,6 +146,7 @@ final class ShuffleBlockFetcherIterator

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-09 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115522781 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -193,8 +206,18 @@ private[spark] object HighlyCompressedMapStatus

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-09 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115519863 --- Diff: core/src/main/scala/org/apache/spark/memory/MemoryManager.scala --- @@ -54,7 +54,8 @@ private[spark] abstract class MemoryManager

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-09 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115519964 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -128,41 +130,52 @@ private[spark] class CompressedMapStatus

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-09 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115518541 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/OneForOneBlockFetcher.java --- @@ -126,4 +151,39 @@ private void

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-08 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-08 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 @cloud-fan Thank you very much for reviewing this thus far :) >How about we always fetch to disk if the block size is over maxBytesInFlight? I super agree with this. It's to

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-08 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115230064 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -163,6 +173,8 @@ final class ShuffleBlockFetcherIterator

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-08 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115229995 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -175,33 +187,45 @@ final class

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-08 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115229968 --- Diff: core/src/test/scala/org/apache/spark/scheduler/MapStatusSuite.scala --- @@ -128,4 +130,22 @@ class MapStatusSuite extends SparkFunSuite

<    1   2   3   4   5   6   7   8   >