[GitHub] spark issue #18239: [SPARK-19462] fix bug in Exchange--pass in a tmp "newPar...

2017-06-15 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18239 @cloud-fan Thanks a lot for reply. Yes, I'm also hesitate to backport branch-1.6; But I think this bug is too obvious -- with `spark.sql.adaptive.enabled=true`, any rerunni

[GitHub] spark pull request #18231: [SPARK-20994] Remove redundant characters in Open...

2017-06-15 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/18231#discussion_r122367121 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java --- @@ -209,4 +190,51 @@ private

[GitHub] spark issue #18231: [SPARK-20994] Remove redundant characters in OpenBlocks ...

2017-06-16 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18231 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request #18327: [SPARK-21047] Add test suites for complicated cas...

2017-06-16 Thread jinxing64
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/18327 [SPARK-21047] Add test suites for complicated cases in ColumnarBatchSuite ## What changes were proposed in this pull request? Current ColumnarBatchSuite has very simple test cases for `Array

[GitHub] spark issue #18327: [WIP][SPARK-21047] Add test suites for complicated cases...

2017-06-16 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18327 @kiszk Would you mind if I make a try for this JIRA? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark issue #18231: [SPARK-20994] Remove redundant characters in OpenBlocks ...

2017-06-16 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18231 @cloud–fan Thanks for merging ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18343: [SPARK-21133][CORE] Fix HighlyCompressedMapStatus#writeE...

2017-06-18 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18343 Thanks for ping. If I understand correctly, `HighlyCompressedStatus` is initialized when 2 situations: 1. Creating `MapStatus` when shuffle-write and the reduce partitions is over 2000

[GitHub] spark issue #14085: [SPARK-16408][SQL] SparkSQL Added file get Exception: is...

2017-06-19 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/14085 @zenglinxi0615 This pr is about adding all files in a directory recursively, thus no need to enumerate all the filenames? I think this can be pretty useful especially in production env

[GitHub] spark issue #18327: [SPARK-21047] Add test suites for complicated cases in C...

2017-06-20 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18327 @kiszk Thank you so much ! I will read your comments carefully and refine this pr : ) --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark issue #18249: [SPARK-19937] Collect metrics for remote bytes read to d...

2017-06-20 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18249 @vanzin Would you mind give more comments when have time ? And I can continue working on this :) --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request #18239: [SPARK-19462] fix bug in Exchange--pass in a tmp ...

2017-06-21 Thread jinxing64
Github user jinxing64 closed the pull request at: https://github.com/apache/spark/pull/18239 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request #18327: [SPARK-21047] Add test suites for complicated cas...

2017-06-21 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/18327#discussion_r123258919 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala --- @@ -739,6 +739,123 @@ class ColumnarBatchSuite

[GitHub] spark pull request #18388: [SPARK-21175] Reject OpenBlocks when memory short...

2017-06-22 Thread jinxing64
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/18388 [SPARK-21175] Reject OpenBlocks when memory shortage on shuffle service. ## What changes were proposed in this pull request? A shuffle service can serves blocks from multiple apps/tasks

[GitHub] spark issue #18327: [SPARK-21047] Add test suites for complicated cases in C...

2017-06-22 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18327 @kiszk I tried to add a test `Nest Array(containing null) in Array.`. Please take a look when you have time and I will continue working on this :) --- If your project is set up for it, you

[GitHub] spark pull request #18327: [SPARK-21047] Add test suites for complicated cas...

2017-06-22 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/18327#discussion_r123473623 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnarBatch.java --- @@ -241,7 +241,40 @@ public MapData getMap(int ordinal

[GitHub] spark pull request #18327: [SPARK-21047] Add test suites for complicated cas...

2017-06-22 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/18327#discussion_r123499668 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala --- @@ -739,6 +739,157 @@ class ColumnarBatchSuite

[GitHub] spark issue #18327: [SPARK-21047] Add test suites for complicated cases in C...

2017-06-23 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18327 @cloud-fan Thanks a lot for taking time review this :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #18327: [SPARK-21047] Add test suites for complicated cas...

2017-06-23 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/18327#discussion_r123690164 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala --- @@ -739,6 +739,157 @@ class ColumnarBatchSuite

[GitHub] spark issue #18327: [SPARK-21047] Add test suites for complicated cases in C...

2017-06-23 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18327 @cloud-fan Thanks for merging :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #18405: [SPARK-21194][SQL][WIP] Fail the putNullmethod wh...

2017-06-23 Thread jinxing64
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/18405 [SPARK-21194][SQL][WIP] Fail the putNullmethod when containsNull=false. ## What changes were proposed in this pull request? Currently there's no check for putting null into a `Arra

[GitHub] spark issue #18405: [SPARK-21194][SQL][WIP] Fail the putNullmethod when cont...

2017-06-24 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18405 cc @kiszk --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if

[GitHub] spark issue #18405: [SPARK-21194][SQL] Fail the putNullmethod when containsN...

2017-06-25 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18405 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #18405: [SPARK-21194][SQL] Fail the putNullmethod when containsN...

2017-06-25 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18405 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #18405: [SPARK-21194][SQL] Fail the putNullmethod when containsN...

2017-06-26 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18405 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #18405: [SPARK-21194][SQL] Fail the putNullmethod when containsN...

2017-06-26 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18405 @kiszk Thanks a lot for taking time review this. I've no idea why test failed :( --- If your project is set up for it, you can reply to this email and have your reply appear on GitH

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-08 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 @cloud-fan More comments on this ? :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-08 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115219258 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -128,41 +130,52 @@ private[spark] class CompressedMapStatus

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-08 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115220025 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -163,6 +173,8 @@ final class ShuffleBlockFetcherIterator

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-08 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115229968 --- Diff: core/src/test/scala/org/apache/spark/scheduler/MapStatusSuite.scala --- @@ -128,4 +130,22 @@ class MapStatusSuite extends SparkFunSuite

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-08 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115229995 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -175,33 +187,45 @@ final class

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-08 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115230064 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -163,6 +173,8 @@ final class ShuffleBlockFetcherIterator

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-08 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 @cloud-fan Thank you very much for reviewing this thus far :) >How about we always fetch to disk if the block size is over maxBytesInFlight? I super agree with this. It'

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-08 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-09 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115518541 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/OneForOneBlockFetcher.java --- @@ -126,4 +151,39 @@ private void

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-09 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115519964 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -128,41 +130,52 @@ private[spark] class CompressedMapStatus

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-09 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115519863 --- Diff: core/src/main/scala/org/apache/spark/memory/MemoryManager.scala --- @@ -54,7 +54,8 @@ private[spark] abstract class MemoryManager

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-09 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115522781 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -193,8 +206,18 @@ private[spark] object HighlyCompressedMapStatus

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-09 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115522902 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -137,6 +146,7 @@ final class ShuffleBlockFetcherIterator

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-09 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115522973 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -154,15 +164,24 @@ final class

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-09 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115523103 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -154,15 +164,24 @@ final class

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-09 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115523324 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/OneForOneBlockFetcher.java --- @@ -126,4 +149,38 @@ private void

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-09 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115523971 --- Diff: common/network-common/src/main/java/org/apache/spark/network/server/OneForOneStreamManager.java --- @@ -95,6 +95,14 @@ public ManagedBuffer

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-09 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115525399 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -128,41 +131,60 @@ private[spark] class CompressedMapStatus

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-09 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115525769 --- Diff: common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java --- @@ -36,7 +36,7 @@ /** * A

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-09 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 @mridulm Really thankful for taking time looking into this pr. Really helpful. I refined according to your comments. Please take another look when you have time and give more comments

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-09 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115638943 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -128,41 +131,60 @@ private[spark] class CompressedMapStatus

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-09 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 Yes, I will refine :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-10 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115884613 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/OneForOneBlockFetcher.java --- @@ -126,4 +150,50 @@ private void

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-10 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 As @mridulm mentioned, in `HighlyCompressedMapStatus` it can be configured in two respects: >1. minimum size before we consider something a large block. >2. The fraction '2&#

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-11 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 @cloud-fan Yes, I think it's a good idea to make `2000` configurable. I will refine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitH

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-11 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115933206 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -175,33 +193,54 @@ final class

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-11 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 @cloud-fan @mridulm I think it's good idea to make 2000 configurable. But checking the code, I'm a little bit hesitant to do that in this pr. I think it's bigger change and so

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-14 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 Very gentle ping to @cloud-fan and @mridulm How do you think about the current change :) ? --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-15 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r116520507 --- Diff: core/src/main/scala/org/apache/spark/memory/MemoryManager.scala --- @@ -54,7 +54,8 @@ private[spark] abstract class MemoryManager

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-15 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r116521474 --- Diff: core/src/main/scala/org/apache/spark/shuffle/BlockStoreShuffleReader.scala --- @@ -51,7 +59,10 @@ private[spark] class BlockStoreShuffleReader

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-15 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 In current code, `spark.memory.offHeap.enabled` is used when decide `tungstenMemoryMode`. `spark.memory.offHeap.enabled` doesn't decide remote blocks are shuffled to whether onHeap or of

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-16 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r116674324 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -193,8 +216,21 @@ private[spark] object HighlyCompressedMapStatus

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-16 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 @cloud-fan Thanks a lot. I will refine :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-16 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r116907199 --- Diff: core/src/test/scala/org/apache/spark/scheduler/MapStatusSuite.scala --- @@ -128,4 +138,27 @@ class MapStatusSuite extends SparkFunSuite

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-16 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r116907401 --- Diff: docs/configuration.md --- @@ -954,16 +970,16 @@ Apart from these, the following properties are also available, and may be useful

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-16 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r116907437 --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala --- @@ -278,4 +278,10 @@ package object config

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-16 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r116907421 --- Diff: core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala --- @@ -401,4 +429,139 @@ class

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-16 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r116911537 --- Diff: core/src/main/scala/org/apache/spark/shuffle/BlockStoreShuffleReader.scala --- @@ -51,7 +59,10 @@ private[spark] class BlockStoreShuffleReader

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-16 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r116911964 --- Diff: core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala --- @@ -401,4 +429,146 @@ class

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-16 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r116919856 --- Diff: core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala --- @@ -401,4 +429,146 @@ class

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-17 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r116929076 --- Diff: core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala --- @@ -401,4 +429,146 @@ class

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-17 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r116929126 --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala --- @@ -278,4 +278,39 @@ package object config

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-17 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 @cloud-fan Thanks, I will refine the documents. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-17 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-17 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r117152091 --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala --- @@ -278,4 +278,39 @@ package object config

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-17 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 It seems like `SparkConfigProvider` is not checking alternatives in `SparkConf`. That's why spark.memory.offHeap.enabled is not set(still the default value), though we've a

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-17 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 Checking the code: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/internal/config/ConfigProvider.scala#L59 `SparkConfigProvider` just check if the key is

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-18 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 @JoshRosen Thanks a lot for taking time looking into this pr. I'm reading your comments carefully. Yes, I think it's good to integrate with memory manager later. I will

[GitHub] spark pull request #18031: Record accurate size of blocks in MapStatus when ...

2017-05-18 Thread jinxing64
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/18031 Record accurate size of blocks in MapStatus when it's above threshold. ## What changes were proposed in this pull request? Currently, when number of reduces is above

[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...

2017-05-18 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18031 To resolve the comments in https://github.com/apache/spark/pull/16989 : >minimum size before we consider something a large block : if average is 10kb, and some blocks are > 20kb, spillin

[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...

2017-05-18 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18031 I try to give user a way to control the memory strictly and no blocks are underestimated(setting spark.shuffle.accurateBlockThreshold=0 and spark.shuffle.accurateBlockThresholdByTimesAverage=1

[GitHub] spark pull request #18031: Record accurate size of blocks in MapStatus when ...

2017-05-18 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/18031#discussion_r117293425 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -193,8 +219,27 @@ private[spark] object HighlyCompressedMapStatus

[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...

2017-05-18 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18031 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...

2017-05-18 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18031 Gentle ping to @JoshRosen @cloud-fan @mridulm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...

2017-05-18 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18031 @HyukjinKwon Thank you so much ! Really helpful 👍 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request #18031: [SPARK-20801] Record accurate size of blocks in M...

2017-05-19 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/18031#discussion_r117460170 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -193,8 +219,27 @@ private[spark] object HighlyCompressedMapStatus

[GitHub] spark pull request #18031: [SPARK-20801] Record accurate size of blocks in M...

2017-05-19 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/18031#discussion_r117461089 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -193,8 +219,27 @@ private[spark] object HighlyCompressedMapStatus

[GitHub] spark pull request #18031: [SPARK-20801] Record accurate size of blocks in M...

2017-05-19 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/18031#discussion_r117510051 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -193,8 +219,27 @@ private[spark] object HighlyCompressedMapStatus

[GitHub] spark pull request #18031: [SPARK-20801] Record accurate size of blocks in M...

2017-05-20 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/18031#discussion_r117610285 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -193,8 +219,27 @@ private[spark] object HighlyCompressedMapStatus

[GitHub] spark pull request #18031: [SPARK-20801] Record accurate size of blocks in M...

2017-05-20 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/18031#discussion_r117610423 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -121,48 +126,69 @@ private[spark] class CompressedMapStatus

[GitHub] spark pull request #18031: [SPARK-20801] Record accurate size of blocks in M...

2017-05-20 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/18031#discussion_r117620188 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -193,8 +219,27 @@ private[spark] object HighlyCompressedMapStatus

[GitHub] spark issue #18031: [SPARK-20801] Record accurate size of blocks in MapStatu...

2017-05-22 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18031 In current change: 1. there's only one config: spark.shuffle.accurateBlockThreshold 2. I remove the huge blocks from the numerator in that calculation for average size --- If

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-22 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 @cloud-fan Yes, thanks a lot for merging #18031 I will update soon ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark issue #18031: [SPARK-20801] Record accurate size of blocks in MapStatu...

2017-05-22 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18031 @cloud-fan Thanks for merging ! @mridulm @JoshRosen @viirya @HyukjinKwon @wzhfy Thanks a lot for taking time reviewing this pr ! --- If your project is set up for it, you can reply to

[GitHub] spark issue #16989: [WIP][SPARK-19659] Fetch big blocks to disk when shuffle...

2017-05-23 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 In current change: 1. `ShuffleBlockFetcherIterator` is not a `MemoryConsumer` 2. Name of shuffle file becomes: ${context.taskAttemptId()}-remote-$bId 3. Try to delete all shuffle files

[GitHub] spark issue #17634: [SPARK-20333] HashPartitioner should be compatible with ...

2017-05-23 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17634 @jiangxb1987 Thank you so much taking time looking into this. Yes, it is not failing in existing code. But I think it's quite straightforward that partitioner should be compatible

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-23 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 @cloud-fan In current change, the shuffle files are deleted twice: 1). After the `ManagedBuffer.release` 2). In the `cleanup()`, the `cleanup()` is already registered as a task

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-23 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 In current change: 1) remove the partial written file when failing 2) remove all shuffle files when `cleanup()`(this is registered as a task completion callback) --- If your project is

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-24 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r118272532 --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala --- @@ -287,4 +287,10 @@ package object config { .bytesConf

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-24 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r118272653 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -175,33 +187,49 @@ final class

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-24 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r118272605 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -163,6 +170,11 @@ final class

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-24 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r118272761 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -175,33 +187,49 @@ final class

[GitHub] spark issue #17634: [SPARK-20333] HashPartitioner should be compatible with ...

2017-05-24 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17634 Yes. it doesn't fail for sure. I just think it's fairly straightforward that partitioner should be compatible with num of child RDD's partitions. I find no reason the num of

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-24 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r118424377 --- Diff: docs/configuration.md --- @@ -520,6 +520,14 @@ Apart from these, the following properties are also available, and may be useful

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-24 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r118424375 --- Diff: core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala --- @@ -401,4 +411,61 @@ class

<    2   3   4   5   6   7   8   >