[GitHub] spark pull request: [SPARK-7058] Include RDD deserialization time ...

2015-04-23 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/5635 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enab

[GitHub] spark pull request: [SPARK-7058] Include RDD deserialization time ...

2015-04-23 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/5635#issuecomment-95707955 LGTM! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request: [SPARK-7058] Include RDD deserialization time ...

2015-04-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5635#issuecomment-95702642 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30

[GitHub] spark pull request: [SPARK-7058] Include RDD deserialization time ...

2015-04-23 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5635#issuecomment-95702611 [Test build #30866 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30866/consoleFull) for PR 5635 at commit [`ed90f75`](https://gith

[GitHub] spark pull request: [SPARK-7058] Include RDD deserialization time ...

2015-04-23 Thread kayousterhout
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/5635#issuecomment-95678486 LGTM! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature ena

[GitHub] spark pull request: [SPARK-7058] Include RDD deserialization time ...

2015-04-23 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5635#issuecomment-95678076 [Test build #30866 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30866/consoleFull) for PR 5635 at commit [`ed90f75`](https://githu

[GitHub] spark pull request: [SPARK-7058] Include RDD deserialization time ...

2015-04-22 Thread kayousterhout
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/5635#issuecomment-95369152 One more nit: could you update the task deserialization time tooltip to explicitly say that it includes the time to read the broadcasted task? Other than th

[GitHub] spark pull request: [SPARK-7058] Include RDD deserialization time ...

2015-04-22 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/5635#discussion_r28925663 --- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala --- @@ -87,12 +87,19 @@ private[spark] abstract class Task[T](val stageId: Int, var

[GitHub] spark pull request: [SPARK-7058] Include RDD deserialization time ...

2015-04-22 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/5635#discussion_r28925620 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -221,8 +221,9 @@ private[spark] class Executor( val afterSeria

[GitHub] spark pull request: [SPARK-7058] Include RDD deserialization time ...

2015-04-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5635#issuecomment-95329451 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30

[GitHub] spark pull request: [SPARK-7058] Include RDD deserialization time ...

2015-04-22 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5635#issuecomment-95329431 [Test build #30782 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30782/consoleFull) for PR 5635 at commit [`4f52910`](https://gith

[GitHub] spark pull request: [SPARK-7058] Include RDD deserialization time ...

2015-04-22 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5635#issuecomment-95322484 [Test build #30777 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30777/consoleFull) for PR 5635 at commit [`21f5b47`](https://gith

[GitHub] spark pull request: [SPARK-7058] Include RDD deserialization time ...

2015-04-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5635#issuecomment-95322500 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30

[GitHub] spark pull request: [SPARK-7058] Include RDD deserialization time ...

2015-04-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5635#issuecomment-95313903 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30

[GitHub] spark pull request: [SPARK-7058] Include RDD deserialization time ...

2015-04-22 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5635#issuecomment-95313864 [Test build #30771 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30771/consoleFull) for PR 5635 at commit [`1752f0e`](https://gith

[GitHub] spark pull request: [SPARK-7058] Include RDD deserialization time ...

2015-04-22 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/5635#issuecomment-95309172 Exposing the time from Task seems like a better design; I've updated to incorporate this idea. --- If your project is set up for it, you can reply to this email and ha

[GitHub] spark pull request: [SPARK-7058] Include RDD deserialization time ...

2015-04-22 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5635#issuecomment-95307866 [Test build #30782 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30782/consoleFull) for PR 5635 at commit [`4f52910`](https://githu

[GitHub] spark pull request: [SPARK-7058] Include RDD deserialization time ...

2015-04-22 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/5635#issuecomment-95302792 I think that the right way to unit test this would be to get the time via the `Clock` interface instead of calling `System.currentTimeMillis()` directly, create a stati

[GitHub] spark pull request: [SPARK-7058] Include RDD deserialization time ...

2015-04-22 Thread kayousterhout
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/5635#issuecomment-95300589 Also it is prohibitively difficult to write a unit test for this? I suspect the answer is yes... --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-7058] Include RDD deserialization time ...

2015-04-22 Thread kayousterhout
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/5635#issuecomment-95299848 It makes me a little nervous that there's now a time gap between deserializeEndTime and when taskStartTime gets calculated. This *should* be very small (there's ju

[GitHub] spark pull request: [SPARK-7058] Include RDD deserialization time ...

2015-04-22 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/5635#issuecomment-95295747 I've updated this patch to push the calculation of the task run time into the Task itself; this avoids double-counting of the deserialization time, which was breaking t

[GitHub] spark pull request: [SPARK-7058] Include RDD deserialization time ...

2015-04-22 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5635#issuecomment-95295669 [Test build #30777 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30777/consoleFull) for PR 5635 at commit [`21f5b47`](https://githu

[GitHub] spark pull request: [SPARK-7058] Include RDD deserialization time ...

2015-04-22 Thread kayousterhout
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/5635#issuecomment-95291847 Thanks for fixing this @JoshRosen! I've sometimes wondered if it would be helpful to specifically break out the broadcast time to help folks with debugging? In any

[GitHub] spark pull request: [SPARK-7058] Include RDD deserialization time ...

2015-04-22 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/5635#discussion_r28900482 --- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala --- @@ -47,10 +47,9 @@ class TaskMetrics extends Serializable { /**

[GitHub] spark pull request: [SPARK-7058] Include RDD deserialization time ...

2015-04-22 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/5635#discussion_r28900417 --- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala --- @@ -47,10 +47,9 @@ class TaskMetrics extends Serializable { /**

[GitHub] spark pull request: [SPARK-7058] Include RDD deserialization time ...

2015-04-22 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/5635#issuecomment-95288245 As written here, I guess that this double-counts some of the time spent in execution, so I probably need to move the setting of the task start time into Task. Let me m

[GitHub] spark pull request: [SPARK-7058] Include RDD deserialization time ...

2015-04-22 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/5635#issuecomment-95287848 /cc @kayousterhout @rxin. I noticed this in some benchmarking work that I'm doing (more details on the JIRA: https://issues.apache.org/jira/browse/SPARK-7058).

[GitHub] spark pull request: [SPARK-7058] Include RDD deserialization time ...

2015-04-22 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5635#issuecomment-95287283 [Test build #30771 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30771/consoleFull) for PR 5635 at commit [`1752f0e`](https://githu

[GitHub] spark pull request: [SPARK-7058] Include RDD deserialization time ...

2015-04-22 Thread JoshRosen
GitHub user JoshRosen opened a pull request: https://github.com/apache/spark/pull/5635 [SPARK-7058] Include RDD deserialization time in "task deserialization time" metric The web UI's "task deserialization time" metric is slightly misleading because it does not capture the time tak