[GitHub] spark pull request #16867: [SPARK-16929] Improve performance when check spec...

2017-02-08 Thread jinxing64
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/16867 [SPARK-16929] Improve performance when check speculatable tasks. ## What changes were proposed in this pull request? When check speculatable tasks in `TaskSetManager`, current code scan

[GitHub] spark issue #16876: [SPARK-19537] Move pendingPartitions to ShuffleMapStage.

2017-02-09 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16876 It's great to have pendingPartitions in ShuffleMapStage. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project doe

[GitHub] spark issue #16831: [SPARK-19263] Fix race in SchedulerIntegrationSuite.

2017-02-09 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16831 @kayousterhout Thanks a lot. Sorry for this and I'll be careful in the future. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as wel

[GitHub] spark issue #16876: [SPARK-19537] Move pendingPartitions to ShuffleMapStage.

2017-02-10 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16876 @kayousterhout It's great to give a definition of `pendingPartitions` in `ShuffleMapStage`. May I ask a question and make my understanding about `pendingPartitions` clear ? It

[GitHub] spark pull request #16901: [SPARK-19565] Improve DAGScheduler tests.

2017-02-12 Thread jinxing64
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/16901 [SPARK-19565] Improve DAGScheduler tests. ## What changes were proposed in this pull request? This is related to #16620. When fetch failed, stage will be resubmitted. There can be

[GitHub] spark issue #16901: [SPARK-19565] Improve DAGScheduler tests.

2017-02-12 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16901 @kayousterhout @squito @markhamstra As mentioned in #16620 , I think it might make sense to make this pr. Please take a look. If you think it is too trivial, I will close. --- If your

[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...

2017-02-12 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16620 @kayousterhout Thanks a lot for the clear explanation. It makes great sense to me and help me understand the logic a lot. Also I think the way of testing is very good and make the code very

[GitHub] spark pull request #16620: [SPARK-19263] DAGScheduler should avoid sending c...

2017-02-13 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16620#discussion_r100953546 --- Diff: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala --- @@ -2161,6 +2161,58 @@ class DAGSchedulerSuite extends SparkFunSuite

[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...

2017-02-13 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16620 @kayousterhout I've refined accordingly, please take another look : ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark issue #16901: [SPARK-19565] Improve DAGScheduler tests.

2017-02-13 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16901 @kayousterhout I've refined accordingly. Sorry for the stupid mistake I made. Please take another look at this : ) --- If your project is set up for it, you can reply to this email and

[GitHub] spark pull request #16901: [SPARK-19565] Improve DAGScheduler tests.

2017-02-13 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16901#discussion_r100968529 --- Diff: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala --- @@ -2161,6 +2161,48 @@ class DAGSchedulerSuite extends SparkFunSuite

[GitHub] spark issue #16901: [SPARK-19565] Improve DAGScheduler tests.

2017-02-14 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16901 @squito Thanks a lot for your comments. I've refined the comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...

2017-02-14 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16620 @squito Thanks a lot. I've refined the comment, please take another look. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as wel

[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...

2017-02-15 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16620 Yes, refined : ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #16620: [SPARK-19263] DAGScheduler should avoid sending conflict...

2017-02-15 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16620 @kayousterhout @squito @markhamstra Thanks for all of your work for this patch. Really appreciate your help : ) --- If your project is set up for it, you can reply to this email and have

[GitHub] spark issue #16690: [SPARK-19347] ReceiverSupervisorImpl can add block to Re...

2017-02-16 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16690 @srowen How do you think about https://github.com/apache/spark/pull/16790? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark issue #16790: [SPARK-19450] Replace askWithRetry with askSync.

2017-02-16 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16790 https://github.com/apache/spark/pull/16690#discussion_r101616883 causes the build to produce lots of deprecation warnings. @srowen @vanzin How do you think about this ? --- If your project

[GitHub] spark issue #16790: [SPARK-19450] Replace askWithRetry with askSync.

2017-02-17 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16790 Both `askSync` and `askWithRetry` are blocking, the only difference is the "retry"(default is 3 times) when the rpc is failed. Callers of this method do not necessarily rely on t

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-02-19 Thread jinxing64
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/16989 [SPARK-19659] Fetch big blocks to disk when shuffle-read. ## What changes were proposed in this pull request? Currently the whole block is fetched into memory(off heap by default) when

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-02-19 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 @vanzin @squito Would you mind to take a look at this when have time ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #16790: [SPARK-19450] Replace askWithRetry with askSync.

2017-02-19 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16790 @srowen @vanzin Thanks a lot for the work on this ~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...

2017-02-20 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16867 @kayousterhout @squito Would you mind to take a look at this when have time ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #16901: [SPARK-19565] Improve DAGScheduler tests.

2017-02-20 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16901 @kayousterhout I'll close since this functionality is already tested. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request #16901: [SPARK-19565] Improve DAGScheduler tests.

2017-02-20 Thread jinxing64
Github user jinxing64 closed the pull request at: https://github.com/apache/spark/pull/16901 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-02-21 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 @squito Thanks a lot for your comments : ) Yes, There must be a design doc for discussing. I will prepare and post a pdf to jira. --- If your project is set up for it, you can reply to

[GitHub] spark issue #16867: [WIP][SPARK-16929] Improve performance when check specul...

2017-02-27 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16867 @squito Thanks a lot for your comments : ) >When check speculatable tasks in TaskSetManager, current code scan all task infos and sort durations of successful tasks in O(NlogN) t

[GitHub] spark pull request #16503: [SPARK-18113] Method canCommit should return the ...

2017-01-08 Thread jinxing64
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/16503 [SPARK-18113] Method canCommit should return the same value when call… …ed by the same attempt multi times. ## What changes were proposed in this pull request? Method

[GitHub] spark issue #16503: [SPARK-18113] canCommit should return same when called b...

2017-01-08 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16503 @mccheah @JoshRosen @ash211 Could you please take look at this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #16503: [SPARK-18113] canCommit should return same when called b...

2017-01-09 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16503 @zsxwing @kayousterhout @andrewor14 Could you please help take a look at this ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark issue #16503: [SPARK-18113] canCommit should return same when called b...

2017-01-09 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16503 @zsxing, @vanzin Maybe using `ask` in method `canCommit` is not suitable(i think). Because `ask` returns a Future, but it should be a blocking process to get result of

[GitHub] spark issue #16503: [SPARK-18113] canCommit should return same when called b...

2017-01-10 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16503 @vanzin Thanks a lot for your comment. It's very helpful. I'll change it to `ask`. I think it make sense to keep receiver idempotent when handling `AskPermissionToCommitOut

[GitHub] spark issue #16503: [SPARK-18113] canCommit should return same when called b...

2017-01-10 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16503 >If we can remove uses of askWithRetry as we find these issues, we can, at some point, finally get rid of the API altogether. How do you think about providing a *"blocking&qu

[GitHub] spark issue #16503: [SPARK-18113] Use ask to replace askWithRetry in canComm...

2017-01-11 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16503 ping @zsxwing @vanzin Could you give another look at this please ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #16503: [SPARK-18113] Use ask to replace askWithRetry in canComm...

2017-01-12 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16503 @ash211 Thank you so much for your comment. I've changed accordingly. Could you please give another look? --- If your project is set up for it, you can reply to this email and have

[GitHub] spark issue #16503: [SPARK-18113] Use ask to replace askWithRetry in canComm...

2017-01-12 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16503 @ash211 Thanks a lot for your comment. I've already fixed the failing Scala style tests. Running `./dev/scalastyle` passed. Could you give another look? --- If your project is set up f

[GitHub] spark issue #16503: [SPARK-18113] Use ask to replace askWithRetry in canComm...

2017-01-13 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16503 @vanzin @zsxwing Thanks a lot for your comment. I will file another jira to add a blocking version of ask. What else can I do for this pr : ) ? --- If your project is set up for it, you

[GitHub] spark pull request #16503: [SPARK-18113] Use ask to replace askWithRetry in ...

2017-01-14 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16503#discussion_r96120047 --- Diff: core/src/test/scala/org/apache/spark/scheduler/OutputCommitCoordinatorSuite.scala --- @@ -221,6 +232,17 @@ private case class

[GitHub] spark pull request #16503: [SPARK-18113] Use ask to replace askWithRetry in ...

2017-01-14 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16503#discussion_r96127359 --- Diff: core/src/test/scala/org/apache/spark/scheduler/OutputCommitCoordinatorSuite.scala --- @@ -221,6 +229,22 @@ private case class

[GitHub] spark issue #16503: [SPARK-18113] Use ask to replace askWithRetry in canComm...

2017-01-14 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16503 @vanzin @ash211 Thanks a lot for your comments; I've changed accordingly. Please give another look at this~~ --- If your project is set up for it, you can reply to this email and have

[GitHub] spark issue #16503: [SPARK-18113] Use ask to replace askWithRetry in canComm...

2017-01-16 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16503 ping --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark pull request #16867: [SPARK-16929] Improve performance when check spec...

2017-03-15 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16867#discussion_r106340321 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -172,7 +172,7 @@ private[spark] class TaskSchedulerImpl private

[GitHub] spark pull request #16867: [SPARK-16929] Improve performance when check spec...

2017-03-15 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16867#discussion_r106340513 --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala --- @@ -893,6 +893,7 @@ class TaskSetManagerSuite extends

[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...

2017-03-15 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16867 @kayousterhout Thanks a lot for the comments :) very helpful. I've refined, please take another look when you have time. --- If your project is set up for it, you can reply to

[GitHub] spark pull request #17312: Display num of executors for the stage.

2017-03-16 Thread jinxing64
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/17312 Display num of executors for the stage. ## What changes were proposed in this pull request? In `StagePage` the total num of executors are not displayed. Since executorId may not be

[GitHub] spark issue #17312: Display num of executors for the stage.

2017-03-16 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17312 @srowen Thanks a lot for quick reply. When we check the reason why a stage ran today much longer than yesterday, we want to know how many executors are supplied. We don't want to coun

[GitHub] spark pull request #17312: [SPARK-19973] Display num of executors for the st...

2017-03-16 Thread jinxing64
Github user jinxing64 closed the pull request at: https://github.com/apache/spark/pull/17312 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request #17312: [SPARK-19973] Display num of executors for the st...

2017-03-16 Thread jinxing64
GitHub user jinxing64 reopened a pull request: https://github.com/apache/spark/pull/17312 [SPARK-19973] Display num of executors for the stage. ## What changes were proposed in this pull request? In `StagePage` the total num of executors are not displayed. Since executorId

[GitHub] spark issue #17312: [SPARK-19973] Display num of executors for the stage.

2017-03-16 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17312 The executor metrics are updated to `StageUIData` when receive executor hear beat. Yes, the longevity of the executor may not cover the whole stage, but it was once active during the stage

[GitHub] spark pull request #16867: [SPARK-16929] Improve performance when check spec...

2017-03-16 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16867#discussion_r106433273 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -172,7 +172,7 @@ private[spark] class TaskSchedulerImpl private

[GitHub] spark pull request #16867: [SPARK-16929] Improve performance when check spec...

2017-03-16 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16867#discussion_r106453205 --- Diff: core/src/test/scala/org/apache/spark/util/collection/MedianHeapSuite.scala --- @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #16867: [SPARK-16929] Improve performance when check spec...

2017-03-16 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16867#discussion_r106453502 --- Diff: core/src/test/scala/org/apache/spark/util/collection/MedianHeapSuite.scala --- @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #17276: [SPARK-19937] Collect metrics of block sizes when shuffl...

2017-03-16 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17276 @squito Would you mind help comment on this when have time ? :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...

2017-03-17 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16867 @squito Sure. I did test for 100k tasks. The results are as below: | | time cost | | --| -- | | insert | 135ms, 122ms, 119ms, 120ms, 163ms

[GitHub] spark issue #17312: [SPARK-19973] Display num of executors for the stage.

2017-03-17 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17312 ![screenshot](https://cloud.githubusercontent.com/assets/4058918/24069386/0f556622-0be2-11e7-9f48-cc096cdd7d9b.png) --- If your project is set up for it, you can reply to this email and have

[GitHub] spark issue #17312: [SPARK-19973] Display num of executors for the stage.

2017-03-17 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17312 @rxin Thanks a lot. I added a number after `Aggregated Metrics by Executor` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...

2017-03-19 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16867 @mridulm Thanks a lot for helping review this : ) really appreciate. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...

2017-03-19 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16867 @squito Thanks :) already refined. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17312: [SPARK-19973] Display num of executors for the stage.

2017-03-20 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17312 @jerryshao Thanks a lot you can help review, really appreciate. I will give a description soon. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #17312: [SPARK-19973] Display num of executors for the stage.

2017-03-20 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17312 ![screenshot2](https://cloud.githubusercontent.com/assets/4058918/24127926/5e0e7294-0e13-11e7-8af0-434b05e2815a.png) --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #17312: [SPARK-19973] Display num of executors for the stage.

2017-03-20 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17312 @rxin @jerryshao I uploaded another screenshot and give a short description there. Now it is (2 executors supplied). --- If your project is set up for it, you can reply to this email and

[GitHub] spark issue #17312: [SPARK-19973] Display num of executors for the stage.

2017-03-20 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17312 Sure, that would be cool :) Thanks again you can help review this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #17312: [SPARK-19973] Display num of executors for the stage.

2017-03-20 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17312 https://cloud.githubusercontent.com/assets/4058918/24134191/8392c5ea-0e3d-11e7-8a53-f164acf04764.png";> --- If your project is set up for it, you can reply to this email and have yo

[GitHub] spark issue #17312: [SPARK-19973] Display num of executors for the stage.

2017-03-20 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17312 @rxin @jerryshao @srowen I've refined the description and uploaded the screenshot of latest version. Please take another look. --- If your project is set up for it, you can reply to

[GitHub] spark issue #17312: [SPARK-19973] Display num of executors for the stage.

2017-03-21 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17312 I want to show the number of executors once active during the stage. `StageUIData` gets updated when receiving the hear beat from executor. --- If your project is set up for it, you can reply to

[GitHub] spark issue #16389: [SPARK-18981][Core]The job hang problem when speculation...

2017-03-21 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16389 @zhaorongsheng I think its better to just not reset `numRunningTasks` to 0. If we got some `ExecutorLostFailure`, the stage should not be marked as finished. --- If your project is set up

[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...

2017-03-21 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16867 @kayousterhout more comments? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #17276: [SPARK-19937] Collect metrics of block sizes when shuffl...

2017-03-21 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17276 @squito Thanks a lot for your comments and I will think and do the test carefully :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #17312: [SPARK-19973] Display num of executors for the stage.

2017-03-21 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17312 @harounemohammedi Thanks a lot for comment on this. I'm hesitate to include the `total time` in this pr. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark issue #17276: [SPARK-19937] Collect metrics of block sizes when shuffl...

2017-03-22 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17276 @squito oh, I feel sorry if this is disturbing. I will mark it as wip. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #17312: [SPARK-19973] Display num of executors for the stage.

2017-03-22 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17312 @rxin because I killed executor1 and it is not active during this stage. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #17276: [WIP][SPARK-19937] Collect metrics of block sizes when s...

2017-03-22 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17276 You are so kind person. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #17312: [SPARK-19973] Display num of executors for the stage.

2017-03-23 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17312 @rxin Yes, I'm so confused by the second screenshot I posted. The only reason I can find is that the `stageData` in `ExecutorTable` is none thread safe. Size(2 executors) returned;

[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...

2017-03-23 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16867 @kayousterhout Thanks a lot for comments. I refined accordingly :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...

2017-03-24 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16867 @kayousterhout @squito @mridulm Thanks for reviewing this ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #17276: [WIP][SPARK-19937] Collect metrics of block sizes when s...

2017-03-25 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17276 @squito Thanks a lot for taking time looking into this pr. I updated the pr. Currently just add two metrics: a) the total size of underestimated blocks size, b) the size of blocks shuffled

[GitHub] spark issue #17276: [SPARK-19937] Collect metrics of block sizes when shuffl...

2017-03-26 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17276 @mridulm Thanks a lot for taking time looking into this and thanks for comments :) 1) I changed the size of underestimated blocks to be `partitionLengths.filter(_ > hc.getAvgSize).

[GitHub] spark pull request #17276: [SPARK-19937] Collect metrics of block sizes when...

2017-03-26 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/17276#discussion_r108061417 --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java --- @@ -169,6 +173,36 @@ public void write(Iterator

[GitHub] spark issue #17276: [SPARK-19937] Collect metrics of block sizes when shuffl...

2017-04-01 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17276 @mridulm Sorry for late reply. I opened the pr for SPARK-19659(https://github.com/apache/spark/pull/16989) and make these two PRs independent. Basically this pr is is to evaluate the

[GitHub] spark pull request #17112: [WIP] Measurement for SPARK-16929.

2017-04-03 Thread jinxing64
Github user jinxing64 closed the pull request at: https://github.com/apache/spark/pull/17112 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request #17533: [SPARK-20219] Schedule tasks based on size of inp...

2017-04-04 Thread jinxing64
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/17533 [SPARK-20219] Schedule tasks based on size of input from ScheduledRDD ## What changes were proposed in this pull request? When data is highly skewed on `ShuffledRDD`, it make sense to

[GitHub] spark issue #17533: [WIP][SPARK-20219] Schedule tasks based on size of input...

2017-04-05 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17533 Yes, I did the test in my cluster. In highly-skew stage, the time cost can be reduced significantly. Tasks are scheduled with locality preference. But in current code, input size of tasks are not

[GitHub] spark pull request #17533: [WIP][SPARK-20219] Schedule tasks based on size o...

2017-04-05 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/17533#discussion_r109877754 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -138,7 +139,7 @@ private[spark] class TaskSetManager( private

[GitHub] spark pull request #17533: [WIP][SPARK-20219] Schedule tasks based on size o...

2017-04-05 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/17533#discussion_r109893630 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -168,6 +169,10 @@ private[spark] class TaskSetManager

[GitHub] spark pull request #17533: [WIP][SPARK-20219] Schedule tasks based on size o...

2017-04-05 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/17533#discussion_r109896244 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -512,6 +522,57 @@ private[spark] class TaskSetManager

[GitHub] spark pull request #17533: [WIP][SPARK-20219] Schedule tasks based on size o...

2017-04-05 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/17533#discussion_r109900096 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -438,6 +443,11 @@ private[spark] class TaskSetManager

[GitHub] spark pull request #17533: [WIP][SPARK-20219] Schedule tasks based on size o...

2017-04-05 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/17533#discussion_r109900019 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -512,6 +522,57 @@ private[spark] class TaskSetManager

[GitHub] spark pull request #17533: [WIP][SPARK-20219] Schedule tasks based on size o...

2017-04-05 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/17533#discussion_r109901087 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -512,6 +522,57 @@ private[spark] class TaskSetManager

[GitHub] spark pull request #17533: [SPARK-20219] Schedule tasks based on size of inp...

2017-04-05 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/17533#discussion_r109930532 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -512,6 +522,57 @@ private[spark] class TaskSetManager

[GitHub] spark issue #16989: [WIP][SPARK-19659] Fetch big blocks to disk when shuffle...

2017-02-27 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 @squito I've uploaded a design doc to jira, please take a look when you have time :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitH

[GitHub] spark pull request #16867: [SPARK-16929] Improve performance when check spec...

2017-02-27 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16867#discussion_r103391138 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -911,14 +916,14 @@ private[spark] class TaskSetManager

[GitHub] spark pull request #17111: [SPARK-19777] Scan runningTasksSet when check spe...

2017-02-28 Thread jinxing64
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/17111 [SPARK-19777] Scan runningTasksSet when check speculatable tasks in T… …askSetManager. ## What changes were proposed in this pull request? When check speculatable tasks in

[GitHub] spark issue #17111: [SPARK-19777] Scan runningTasksSet when check speculatab...

2017-02-28 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17111 cc @kayousterhout @squito --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...

2017-02-28 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16867 @kayousterhout @squito It's great to open a new jira for this change. Please take a look at https://github.com/apache/spark/pull/17111. --- If your project is set up for it, you can

[GitHub] spark issue #17111: [SPARK-19777] Scan runningTasksSet when check speculatab...

2017-02-28 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17111 @squito Thanks a lot :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request #17112: Measurement for SPARK-16929.

2017-02-28 Thread jinxing64
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/17112 Measurement for SPARK-16929. ## What changes were proposed in this pull request? This pr doesn't target for merging. It's a measurement for https://github.com/apache/spark/

[GitHub] spark issue #17112: Measurement for SPARK-16929.

2017-02-28 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17112 The unit test "Measurement for SPARK-16929." added is the measurement. In TaskSetManagerSuite.scala line 1049, if `newAlgorithm=true`, `successfulTaskIdsSet `will be used to get

[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...

2017-02-28 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16867 I added a measurement for this pr in #17112 . Results are as below, newAlgorithm indicates whether we use `TreeSet` to get the median duration or not. And `time cost` is the time used when get

[GitHub] spark issue #17111: [SPARK-19777] Scan runningTasksSet when check speculatab...

2017-03-01 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17111 @kayousterhout Thanks for merging. (btw, I made some measurements for https://github.com/apache/spark/pull/16867 SPARK-16929, please take a look when you have time :) ) --- If your

[GitHub] spark pull request #17133: [SPARK-19793] Use clock.getTimeMillis when mark t...

2017-03-02 Thread jinxing64
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/17133 [SPARK-19793] Use clock.getTimeMillis when mark task as finished in TaskSetManager. ## What changes were proposed in this pull request? TaskSetManager is now using

[GitHub] spark issue #17133: [SPARK-19793] Use clock.getTimeMillis when mark task as ...

2017-03-02 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/17133 I found this when do https://github.com/apache/spark/pull/17112, which is for measuring the approach I proposed in https://github.com/apache/spark/pull/16867. --- If your project is set up for

  1   2   3   4   5   6   7   8   >