[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-08 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115220025 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -163,6 +173,8 @@ final class ShuffleBlockFetcherIterator

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-08 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115219258 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -128,41 +130,52 @@ private[spark] class CompressedMapStatus

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-08 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 @cloud-fan More comments on this ? :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-05 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115021013 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -128,41 +130,52 @@ private[spark] class CompressedMapStatus

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-05 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r115018506 --- Diff: core/src/test/scala/org/apache/spark/scheduler/MapStatusSuite.scala --- @@ -128,4 +130,23 @@ class MapStatusSuite extends SparkFunSuite

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-05 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 @cloud-fan Really really thankful for reviewing this pr:). I've refined according to your comments. Please take another look at this when you have time. --- If your project is set up

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-05 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r114967741 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -128,41 +130,52 @@ private[spark] class CompressedMapStatus

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-05 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r114965988 --- Diff: core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala --- @@ -401,4 +424,74 @@ class

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-05 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r114965950 --- Diff: core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala --- @@ -401,4 +424,74 @@ class

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-05 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r114965477 --- Diff: core/src/test/scala/org/apache/spark/shuffle/BlockStoreShuffleReaderSuite.scala --- @@ -126,11 +131,21 @@ class BlockStoreShuffleReaderSuite

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-05 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r114965327 --- Diff: core/src/test/scala/org/apache/spark/scheduler/MapStatusSuite.scala --- @@ -128,4 +130,23 @@ class MapStatusSuite extends SparkFunSuite

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-05 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r114961196 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -175,33 +181,41 @@ final class

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-05 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r114960285 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -193,8 +206,18 @@ private[spark] object HighlyCompressedMapStatus

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-05 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r114960041 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -128,41 +130,52 @@ private[spark] class CompressedMapStatus

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-05 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r114959211 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/OneForOneBlockFetcher.java --- @@ -126,4 +151,39 @@ private void

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-05 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r114943530 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/OneForOneBlockFetcher.java --- @@ -100,7 +114,14 @@ public void

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-05 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-03 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r114696480 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/OneForOneBlockFetcher.java --- @@ -100,7 +114,14 @@ public void

[GitHub] spark issue #17744: [SPARK-20426] Lazy initialization of FileSegmentManagedB...

2017-05-03 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17744 Thanks again for help review this pr. Currently I'm not seeing memory issue on my nodemanagers. I'd report to community if there's new finding :) --- If your project is set up for it, you can

[GitHub] spark issue #17744: [SPARK-20426] Lazy initialization of FileSegmentManagedB...

2017-05-03 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17744 @tgravescs Thanks a lot for merging. I proposed to resolve this by "Lazy initialization of FileSegmentManagedBuffer" and simplify the change. But after checking the code, could

[GitHub] spark pull request #16989: [WIP][SPARK-19659] Fetch big blocks to disk when ...

2017-05-03 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r114504511 --- Diff: core/src/main/scala/org/apache/spark/shuffle/BlockStoreShuffleReader.scala --- @@ -42,6 +46,12 @@ private[spark] class BlockStoreShuffleReader

[GitHub] spark pull request #16989: [WIP][SPARK-19659] Fetch big blocks to disk when ...

2017-05-03 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r114503627 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -133,36 +135,53 @@ private[spark] class HighlyCompressedMapStatus private

[GitHub] spark pull request #16989: [WIP][SPARK-19659] Fetch big blocks to disk when ...

2017-05-03 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r114503557 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/OneForOneBlockFetcher.java --- @@ -126,4 +147,38 @@ private void

[GitHub] spark pull request #16989: [WIP][SPARK-19659] Fetch big blocks to disk when ...

2017-05-03 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r114503489 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/OneForOneBlockFetcher.java --- @@ -100,7 +114,14 @@ public void

[GitHub] spark issue #17634: [SPARK-20333] HashPartitioner should be compatible with ...

2017-04-26 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17634 @kayousterhout @mridulm Does this pr make sense? Could you please take a look this when you have time :) --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request #17744: [SPARK-20426] Lazy initialization of FileSegmentM...

2017-04-25 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/17744#discussion_r113356306 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java --- @@ -93,14 +92,25 @@ protected void

[GitHub] spark issue #17744: [SPARK-20426] Lazy initialization of FileSegmentManagedB...

2017-04-24 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17744 Spark jobs are running on yarn cluster in my warehouse. We enabled the external shuffle service(--conf spark.shuffle.service.enabled=true). Recently NodeManager runs OOM now and then. Dumping

[GitHub] spark pull request #17744: [SPARK-20426] Lazy initialization of FileSegmentM...

2017-04-24 Thread jinxing64
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/17744 [SPARK-20426] Lazy initialization of FileSegmentManagedBuffer for shuffle service. ## What changes were proposed in this pull request? When application contains large amount of shuffle

[GitHub] spark pull request #16989: [WIP][SPARK-19659] Fetch big blocks to disk when ...

2017-04-17 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r111734780 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -133,36 +135,53 @@ private[spark] class HighlyCompressedMapStatus private

[GitHub] spark issue #16989: [WIP][SPARK-19659] Fetch big blocks to disk when shuffle...

2017-04-17 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #17533: [WIP][SPARK-20219] Schedule tasks based on size of input...

2017-04-14 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17533 I think the failed unit test can be fixed in https://github.com/apache/spark/pull/17634 and https://github.com/apache/spark/pull/17603 --- If your project is set up for it, you can reply

[GitHub] spark issue #17603: [SPARK-20288] Avoid generating the MapStatus by stageId ...

2017-04-14 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17603 @squito Could you help comment on this ? :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #17634: [SPARK-20333] HashPartitioner should be compatible with ...

2017-04-14 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17634 @squito @srowen Could you help comment on this :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #17533: [WIP][SPARK-20219] Schedule tasks based on size of input...

2017-04-14 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17533 @squito Thank you so much for reviewing thus far and sorry for the complexity I bring in. I tried to simplify the code according to your comment and please take another look when tests

[GitHub] spark pull request #17533: [WIP][SPARK-20219] Schedule tasks based on size o...

2017-04-14 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/17533#discussion_r111546462 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -168,6 +169,8 @@ private[spark] class TaskSetManager

[GitHub] spark pull request #17533: [WIP][SPARK-20219] Schedule tasks based on size o...

2017-04-14 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/17533#discussion_r111545406 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1080,6 +1122,25 @@ class DAGScheduler

[GitHub] spark pull request #17533: [WIP][SPARK-20219] Schedule tasks based on size o...

2017-04-14 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/17533#discussion_r111545327 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1080,6 +1122,25 @@ class DAGScheduler

[GitHub] spark pull request #17533: [WIP][SPARK-20219] Schedule tasks based on size o...

2017-04-14 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/17533#discussion_r111545285 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -472,6 +472,47 @@ class DAGScheduler

[GitHub] spark pull request #17533: [WIP][SPARK-20219] Schedule tasks based on size o...

2017-04-14 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/17533#discussion_r111545019 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -472,6 +472,47 @@ class DAGScheduler

[GitHub] spark issue #17634: [SPARK-20333] HashPartitioner should be compatible with ...

2017-04-13 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17634 I found this when doing https://github.com/apache/spark/pull/17533 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request #17634: [SPARK-20333] HashPartitioner should be compatibl...

2017-04-13 Thread jinxing64
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/17634 [SPARK-20333] HashPartitioner should be compatible with num of child RDD's partitions. ## What changes were proposed in this pull request? Fix test "don't submit stage unti

[GitHub] spark issue #17603: [SPARK-20288] Avoid generating the MapStatus by stageId ...

2017-04-11 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17603 I found this when test https://github.com/apache/spark/pull/17533. It failed now and then when try to get size of reduce from `MapStatus`. I'm not sure how to make it better: Modify

[GitHub] spark pull request #17603: [SPARK-20288] Avoid generating the MapStatus by s...

2017-04-11 Thread jinxing64
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/17603 [SPARK-20288] Avoid generating the MapStatus by stageId in BasicSchedulerIntegrationSuite ## What changes were proposed in this pull request? ShuffleId is determined before job

[GitHub] spark issue #17533: [SPARK-20219] Schedule tasks based on size of input from...

2017-04-09 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17533 @squito Thank you so much for taking look into this. > we don't want the TSM requesting info from the DAGSCheduler Sorry I missed this point for the previous change. Now I p

[GitHub] spark issue #17533: [SPARK-20219] Schedule tasks based on size of input from...

2017-04-07 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17533 @kayousterhout Thanks a lot for comment and sorry for late reply. I replied your comment from JIRA. Please take a look when you have time :) --- If your project is set up for it, you can

[GitHub] spark pull request #17533: [SPARK-20219] Schedule tasks based on size of inp...

2017-04-05 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/17533#discussion_r109930532 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -512,6 +522,57 @@ private[spark] class TaskSetManager

[GitHub] spark pull request #17533: [WIP][SPARK-20219] Schedule tasks based on size o...

2017-04-05 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/17533#discussion_r109901087 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -512,6 +522,57 @@ private[spark] class TaskSetManager

[GitHub] spark pull request #17533: [WIP][SPARK-20219] Schedule tasks based on size o...

2017-04-05 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/17533#discussion_r109900096 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -438,6 +443,11 @@ private[spark] class TaskSetManager

[GitHub] spark pull request #17533: [WIP][SPARK-20219] Schedule tasks based on size o...

2017-04-05 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/17533#discussion_r109900019 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -512,6 +522,57 @@ private[spark] class TaskSetManager

[GitHub] spark pull request #17533: [WIP][SPARK-20219] Schedule tasks based on size o...

2017-04-05 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/17533#discussion_r109896244 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -512,6 +522,57 @@ private[spark] class TaskSetManager

[GitHub] spark pull request #17533: [WIP][SPARK-20219] Schedule tasks based on size o...

2017-04-05 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/17533#discussion_r109893630 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -168,6 +169,10 @@ private[spark] class TaskSetManager

[GitHub] spark pull request #17533: [WIP][SPARK-20219] Schedule tasks based on size o...

2017-04-05 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/17533#discussion_r109877754 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -138,7 +139,7 @@ private[spark] class TaskSetManager( private

[GitHub] spark issue #17533: [WIP][SPARK-20219] Schedule tasks based on size of input...

2017-04-05 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17533 Yes, I did the test in my cluster. In highly-skew stage, the time cost can be reduced significantly. Tasks are scheduled with locality preference. But in current code, input size of tasks

[GitHub] spark pull request #17533: [SPARK-20219] Schedule tasks based on size of inp...

2017-04-04 Thread jinxing64
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/17533 [SPARK-20219] Schedule tasks based on size of input from ScheduledRDD ## What changes were proposed in this pull request? When data is highly skewed on `ShuffledRDD`, it make sense

[GitHub] spark pull request #17112: [WIP] Measurement for SPARK-16929.

2017-04-03 Thread jinxing64
Github user jinxing64 closed the pull request at: https://github.com/apache/spark/pull/17112 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #17276: [SPARK-19937] Collect metrics of block sizes when shuffl...

2017-04-01 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17276 @mridulm Sorry for late reply. I opened the pr for SPARK-19659(https://github.com/apache/spark/pull/16989) and make these two PRs independent. Basically this pr is is to evaluate

[GitHub] spark pull request #17276: [SPARK-19937] Collect metrics of block sizes when...

2017-03-26 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/17276#discussion_r108061417 --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java --- @@ -169,6 +173,36 @@ public void write(Iterator

[GitHub] spark issue #17276: [SPARK-19937] Collect metrics of block sizes when shuffl...

2017-03-26 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17276 @mridulm Thanks a lot for taking time looking into this and thanks for comments :) 1) I changed the size of underestimated blocks to be `partitionLengths.filter(_ > hc.getAvgSize).

[GitHub] spark issue #17276: [WIP][SPARK-19937] Collect metrics of block sizes when s...

2017-03-25 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17276 @squito Thanks a lot for taking time looking into this pr. I updated the pr. Currently just add two metrics: a) the total size of underestimated blocks size, b) the size of blocks shuffled

[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...

2017-03-24 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16867 @kayousterhout @squito @mridulm Thanks for reviewing this ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...

2017-03-23 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16867 @kayousterhout Thanks a lot for comments. I refined accordingly :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #17312: [SPARK-19973] Display num of executors for the stage.

2017-03-23 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17312 @rxin Yes, I'm so confused by the second screenshot I posted. The only reason I can find is that the `stageData` in `ExecutorTable` is none thread safe. Size(2 executors) returned; maybe

[GitHub] spark issue #17276: [WIP][SPARK-19937] Collect metrics of block sizes when s...

2017-03-22 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17276 You are so kind person. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #17312: [SPARK-19973] Display num of executors for the stage.

2017-03-22 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17312 @rxin because I killed executor1 and it is not active during this stage. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #17276: [SPARK-19937] Collect metrics of block sizes when shuffl...

2017-03-22 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17276 @squito oh, I feel sorry if this is disturbing. I will mark it as wip. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #17312: [SPARK-19973] Display num of executors for the stage.

2017-03-22 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17312 @harounemohammedi Thanks a lot for comment on this. I'm hesitate to include the `total time` in this pr. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #17276: [SPARK-19937] Collect metrics of block sizes when shuffl...

2017-03-21 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17276 @squito Thanks a lot for your comments and I will think and do the test carefully :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...

2017-03-21 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16867 @kayousterhout more comments? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16389: [SPARK-18981][Core]The job hang problem when speculation...

2017-03-21 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16389 @zhaorongsheng I think its better to just not reset `numRunningTasks` to 0. If we got some `ExecutorLostFailure`, the stage should not be marked as finished. --- If your project is set up

[GitHub] spark issue #17312: [SPARK-19973] Display num of executors for the stage.

2017-03-21 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17312 I want to show the number of executors once active during the stage. `StageUIData` gets updated when receiving the hear beat from executor. --- If your project is set up for it, you can reply

[GitHub] spark issue #17312: [SPARK-19973] Display num of executors for the stage.

2017-03-20 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17312 @rxin @jerryshao @srowen I've refined the description and uploaded the screenshot of latest version. Please take another look. --- If your project is set up for it, you can reply

[GitHub] spark issue #17312: [SPARK-19973] Display num of executors for the stage.

2017-03-20 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17312 https://cloud.githubusercontent.com/assets/4058918/24134191/8392c5ea-0e3d-11e7-8a53-f164acf04764.png;> --- If your project is set up for it, you can reply to this email and have your re

[GitHub] spark issue #17312: [SPARK-19973] Display num of executors for the stage.

2017-03-20 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17312 Sure, that would be cool :) Thanks again you can help review this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #17312: [SPARK-19973] Display num of executors for the stage.

2017-03-20 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17312 @rxin @jerryshao I uploaded another screenshot and give a short description there. Now it is (2 executors supplied). --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #17312: [SPARK-19973] Display num of executors for the stage.

2017-03-20 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17312 ![screenshot2](https://cloud.githubusercontent.com/assets/4058918/24127926/5e0e7294-0e13-11e7-8af0-434b05e2815a.png) --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #17312: [SPARK-19973] Display num of executors for the stage.

2017-03-20 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17312 @jerryshao Thanks a lot you can help review, really appreciate. I will give a description soon. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...

2017-03-19 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16867 @squito Thanks :) already refined. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...

2017-03-19 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16867 @mridulm Thanks a lot for helping review this : ) really appreciate. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #17312: [SPARK-19973] Display num of executors for the stage.

2017-03-17 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17312 @rxin Thanks a lot. I added a number after `Aggregated Metrics by Executor` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #17312: [SPARK-19973] Display num of executors for the stage.

2017-03-17 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17312 ![screenshot](https://cloud.githubusercontent.com/assets/4058918/24069386/0f556622-0be2-11e7-9f48-cc096cdd7d9b.png) --- If your project is set up for it, you can reply to this email and have

[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...

2017-03-17 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16867 @squito Sure. I did test for 100k tasks. The results are as below: | | time cost | | --| -- | | insert | 135ms, 122ms, 119ms, 120ms, 163ms

[GitHub] spark issue #17276: [SPARK-19937] Collect metrics of block sizes when shuffl...

2017-03-16 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17276 @squito Would you mind help comment on this when have time ? :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request #16867: [SPARK-16929] Improve performance when check spec...

2017-03-16 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16867#discussion_r106453502 --- Diff: core/src/test/scala/org/apache/spark/util/collection/MedianHeapSuite.scala --- @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #16867: [SPARK-16929] Improve performance when check spec...

2017-03-16 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16867#discussion_r106453205 --- Diff: core/src/test/scala/org/apache/spark/util/collection/MedianHeapSuite.scala --- @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #16867: [SPARK-16929] Improve performance when check spec...

2017-03-16 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16867#discussion_r106433273 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -172,7 +172,7 @@ private[spark] class TaskSchedulerImpl private

[GitHub] spark issue #17312: [SPARK-19973] Display num of executors for the stage.

2017-03-16 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17312 The executor metrics are updated to `StageUIData` when receive executor hear beat. Yes, the longevity of the executor may not cover the whole stage, but it was once active during the stage

[GitHub] spark pull request #17312: [SPARK-19973] Display num of executors for the st...

2017-03-16 Thread jinxing64
GitHub user jinxing64 reopened a pull request: https://github.com/apache/spark/pull/17312 [SPARK-19973] Display num of executors for the stage. ## What changes were proposed in this pull request? In `StagePage` the total num of executors are not displayed. Since executorId

[GitHub] spark pull request #17312: [SPARK-19973] Display num of executors for the st...

2017-03-16 Thread jinxing64
Github user jinxing64 closed the pull request at: https://github.com/apache/spark/pull/17312 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #17312: Display num of executors for the stage.

2017-03-16 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17312 @srowen Thanks a lot for quick reply. When we check the reason why a stage ran today much longer than yesterday, we want to know how many executors are supplied. We don't want to count

[GitHub] spark pull request #17312: Display num of executors for the stage.

2017-03-16 Thread jinxing64
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/17312 Display num of executors for the stage. ## What changes were proposed in this pull request? In `StagePage` the total num of executors are not displayed. Since executorId may

[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...

2017-03-15 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16867 @kayousterhout Thanks a lot for the comments :) very helpful. I've refined, please take another look when you have time. --- If your project is set up for it, you can reply

[GitHub] spark pull request #16867: [SPARK-16929] Improve performance when check spec...

2017-03-15 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16867#discussion_r106340513 --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala --- @@ -893,6 +893,7 @@ class TaskSetManagerSuite extends

[GitHub] spark pull request #16867: [SPARK-16929] Improve performance when check spec...

2017-03-15 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16867#discussion_r106340321 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -172,7 +172,7 @@ private[spark] class TaskSchedulerImpl private

[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...

2017-03-15 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16867 @kayousterhout @mridulm More comments on this ? :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...

2017-03-13 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16867 @squito Thanks a lot for comments. I've refined :):) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #17276: [WIP][SPARK-19937] Collect metrics of block sizes...

2017-03-13 Thread jinxing64
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/17276 [WIP][SPARK-19937] Collect metrics of block sizes when shuffle. ## What changes were proposed in this pull request? Metrics of blocks sizes(when shuffle) should be collected for later

[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...

2017-03-10 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16867 @squito Sorry, it seems like something went wrong when I did merge and try resolve the conflict. I squashed the commits and did rebase. It seems ok now. --- If your project is set up for it, you

[GitHub] spark issue #17133: [SPARK-19793] Use clock.getTimeMillis when mark task as ...

2017-03-09 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17133 @vanzin @srowen I refined according to the comments, please take a look when you have time :) --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...

2017-03-09 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16867 @kayousterhout @squito @mridulm I refined according comments. Please take a look when you have time :) --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #17208: [SPARK-19868] conflict TasksetManager lead to spark stop...

2017-03-09 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17208 @squito Thanks for notification :) this is not in my pr. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

<    1   2   3   4   5   6   7   8   >