Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/5635
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enab
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/5635#issuecomment-95707955
LGTM!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/5635#issuecomment-95702642
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5635#issuecomment-95702611
[Test build #30866 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30866/consoleFull)
for PR 5635 at commit
[`ed90f75`](https://gith
Github user kayousterhout commented on the pull request:
https://github.com/apache/spark/pull/5635#issuecomment-95678486
LGTM!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
ena
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5635#issuecomment-95678076
[Test build #30866 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30866/consoleFull)
for PR 5635 at commit
[`ed90f75`](https://githu
Github user kayousterhout commented on the pull request:
https://github.com/apache/spark/pull/5635#issuecomment-95369152
One more nit: could you update the task deserialization time tooltip to
explicitly say that it includes the time to read the broadcasted task?
Other than th
Github user kayousterhout commented on a diff in the pull request:
https://github.com/apache/spark/pull/5635#discussion_r28925663
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
@@ -87,12 +87,19 @@ private[spark] abstract class Task[T](val stageId: Int,
var
Github user kayousterhout commented on a diff in the pull request:
https://github.com/apache/spark/pull/5635#discussion_r28925620
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -221,8 +221,9 @@ private[spark] class Executor(
val afterSeria
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/5635#issuecomment-95329451
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5635#issuecomment-95329431
[Test build #30782 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30782/consoleFull)
for PR 5635 at commit
[`4f52910`](https://gith
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5635#issuecomment-95322484
[Test build #30777 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30777/consoleFull)
for PR 5635 at commit
[`21f5b47`](https://gith
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/5635#issuecomment-95322500
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/5635#issuecomment-95313903
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5635#issuecomment-95313864
[Test build #30771 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30771/consoleFull)
for PR 5635 at commit
[`1752f0e`](https://gith
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/5635#issuecomment-95309172
Exposing the time from Task seems like a better design; I've updated to
incorporate this idea.
---
If your project is set up for it, you can reply to this email and ha
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5635#issuecomment-95307866
[Test build #30782 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30782/consoleFull)
for PR 5635 at commit
[`4f52910`](https://githu
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/5635#issuecomment-95302792
I think that the right way to unit test this would be to get the time via
the `Clock` interface instead of calling `System.currentTimeMillis()` directly,
create a stati
Github user kayousterhout commented on the pull request:
https://github.com/apache/spark/pull/5635#issuecomment-95300589
Also it is prohibitively difficult to write a unit test for this? I suspect
the answer is yes...
---
If your project is set up for it, you can reply to this email
Github user kayousterhout commented on the pull request:
https://github.com/apache/spark/pull/5635#issuecomment-95299848
It makes me a little nervous that there's now a time gap between
deserializeEndTime and when taskStartTime gets calculated. This *should* be
very small (there's ju
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/5635#issuecomment-95295747
I've updated this patch to push the calculation of the task run time into
the Task itself; this avoids double-counting of the deserialization time, which
was breaking t
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5635#issuecomment-95295669
[Test build #30777 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30777/consoleFull)
for PR 5635 at commit
[`21f5b47`](https://githu
Github user kayousterhout commented on the pull request:
https://github.com/apache/spark/pull/5635#issuecomment-95291847
Thanks for fixing this @JoshRosen! I've sometimes wondered if it would be
helpful to specifically break out the broadcast time to help folks with
debugging? In any
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/5635#discussion_r28900482
--- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala
---
@@ -47,10 +47,9 @@ class TaskMetrics extends Serializable {
/**
Github user kayousterhout commented on a diff in the pull request:
https://github.com/apache/spark/pull/5635#discussion_r28900417
--- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala
---
@@ -47,10 +47,9 @@ class TaskMetrics extends Serializable {
/**
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/5635#issuecomment-95288245
As written here, I guess that this double-counts some of the time spent in
execution, so I probably need to move the setting of the task start time into
Task. Let me m
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/5635#issuecomment-95287848
/cc @kayousterhout @rxin.
I noticed this in some benchmarking work that I'm doing (more details on
the JIRA: https://issues.apache.org/jira/browse/SPARK-7058).
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/5635#issuecomment-95287283
[Test build #30771 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30771/consoleFull)
for PR 5635 at commit
[`1752f0e`](https://githu
GitHub user JoshRosen opened a pull request:
https://github.com/apache/spark/pull/5635
[SPARK-7058] Include RDD deserialization time in "task deserialization
time" metric
The web UI's "task deserialization time" metric is slightly misleading
because it does not capture the time tak
29 matches
Mail list logo