Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12655#issuecomment-217780852
**[Test build #58121 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58121/consoleFull)**
for PR 12655 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12655#issuecomment-217780501
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12655#issuecomment-217780498
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12655#issuecomment-217780496
**[Test build #58119 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58119/consoleFull)**
for PR 12655 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12655#issuecomment-217780312
**[Test build #58119 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58119/consoleFull)**
for PR 12655 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12655#issuecomment-216745078
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12655#issuecomment-216745080
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12655#issuecomment-216744662
**[Test build #57709 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57709/consoleFull)**
for PR 12655 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12655#issuecomment-216732592
**[Test build #57709 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57709/consoleFull)**
for PR 12655 at commit
Github user kayousterhout commented on the pull request:
https://github.com/apache/spark/pull/12655#issuecomment-216727287
@ueshin that would be great so we can merge this!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well.
Github user ueshin commented on the pull request:
https://github.com/apache/spark/pull/12655#issuecomment-216706049
I think we are moving to #8923, but no updates there.
Should I use #8923 way here?
---
If your project is set up for it, you can reply to this email and have your
Github user markhamstra commented on the pull request:
https://github.com/apache/spark/pull/12655#issuecomment-215560207
Sorry, on re-reading my own post I realize that I wasn't completely clear.
I'm saying that the two-line change in DAGScheduler.scala from
Github user markhamstra commented on the pull request:
https://github.com/apache/spark/pull/12655#issuecomment-215556785
I'm fine with the change in https://github.com/apache/spark/pull/8923 I'm
looking at refactoring of `newOrUsedShuffleStage` that may effectively turn it
into just
Github user suyanNone commented on the pull request:
https://github.com/apache/spark/pull/12655#issuecomment-215329471
as I know, duplicate stage occurs:
stage 2 and stage 3 all depends on stage 1
stage 4 depends on stage 2 and stage 3
So, if we get
Github user ueshin commented on the pull request:
https://github.com/apache/spark/pull/12655#issuecomment-215317033
I saw the PR #8427 now.
Both the #8427 approach and @markhamstra's approach (should we use
`getOrElseUpdate` instead of `getOrElse`?) seem like the simplest way to
Github user markhamstra commented on the pull request:
https://github.com/apache/spark/pull/12655#issuecomment-215250204
Yes, I'd also like to avoid trying to fix all of these related issues in
one big PR. There are overlapping concerns, but I agree that a simple fix to
the
Github user kayousterhout commented on the pull request:
https://github.com/apache/spark/pull/12655#issuecomment-215240356
@markhamstra why is the newOrUsedShuffleStage approach better / different?
IMO the existing code is very convoluted at the moment, but we should first
Github user markhamstra commented on the pull request:
https://github.com/apache/spark/pull/12655#issuecomment-215237185
I'd also like to address the concerns of @squito 's
https://github.com/apache/spark/pull/8427 and @sitalkedia 's
https://github.com/apache/spark/pull/12436
Github user markhamstra commented on the pull request:
https://github.com/apache/spark/pull/12655#issuecomment-215235472
What I'm actually trying to look at right now is something very similar to
https://github.com/apache/spark/pull/8923, but pushing into
newOrUsedShuffleStage the
Github user kayousterhout commented on the pull request:
https://github.com/apache/spark/pull/12655#issuecomment-215200108
Thanks for pointing that out @markhamstra. I'd be in favor of that solution
(but augmented with a more clear test case -- which might just mean adding a
nice
Github user markhamstra commented on the pull request:
https://github.com/apache/spark/pull/12655#issuecomment-215155985
It looks like https://github.com/apache/spark/pull/8923 by @suyanNone fixes
at least the test that this PR adds to the DAGSchedulerSuite, and does so much
more
Github user ueshin commented on the pull request:
https://github.com/apache/spark/pull/12655#issuecomment-214670763
@kayousterhout
I added a comment to
Github user kayousterhout commented on the pull request:
https://github.com/apache/spark/pull/12655#issuecomment-214619200
I see -- now I understand the motivation for returning the shuffle
dependencies topologically sorted, because it limits the depth of the recursion
(it looks like
Github user ueshin commented on the pull request:
https://github.com/apache/spark/pull/12655#issuecomment-214617772
@kayousterhout
I'll update the unit test and JIRA, PR descriptions to use the simpler
example.
As for the `createAncestorShuffleMapStages` way, I think it
Github user kayousterhout commented on the pull request:
https://github.com/apache/spark/pull/12655#issuecomment-214455792
I just updated the
[JIRA](https://issues.apache.org/jira/browse/SPARK-13902) with what I
understand to be the issue. Can you take a look and let me know if
Github user ueshin commented on the pull request:
https://github.com/apache/spark/pull/12655#issuecomment-214163952
@markhamstra Thank you for your comment.
I thought the non-optimal state in `DAGScheduler` was a kind of bug so I
used the words "incorrect" or "illegal" but now I
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12655#issuecomment-214152067
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12655#issuecomment-214152071
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12655#issuecomment-214151363
**[Test build #56873 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56873/consoleFull)**
for PR 12655 at commit
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/12655#discussion_r60862612
--- Diff:
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ---
@@ -322,6 +322,59 @@ class DAGSchedulerSuite extends SparkFunSuite with
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12655#issuecomment-214122523
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/12655#issuecomment-214122524
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12655#issuecomment-214122275
**[Test build #56868 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56868/consoleFull)**
for PR 12655 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12655#issuecomment-214118512
**[Test build #56873 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56873/consoleFull)**
for PR 12655 at commit
Github user markhamstra commented on the pull request:
https://github.com/apache/spark/pull/12655#issuecomment-214101076
I think some of the terminology used in this and related PRs is confusing
the issues. When @kayousterhout and I ask about "correctness", what we are
fundamentally
Github user kayousterhout commented on a diff in the pull request:
https://github.com/apache/spark/pull/12655#discussion_r60858779
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -403,32 +403,47 @@ class DAGScheduler(
parents.toList
Github user kayousterhout commented on a diff in the pull request:
https://github.com/apache/spark/pull/12655#discussion_r60858768
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -403,32 +403,47 @@ class DAGScheduler(
parents.toList
Github user kayousterhout commented on a diff in the pull request:
https://github.com/apache/spark/pull/12655#discussion_r60858715
--- Diff:
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ---
@@ -322,6 +322,59 @@ class DAGSchedulerSuite extends
Github user kayousterhout commented on a diff in the pull request:
https://github.com/apache/spark/pull/12655#discussion_r60858660
--- Diff:
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ---
@@ -322,6 +322,59 @@ class DAGSchedulerSuite extends
Github user kayousterhout commented on a diff in the pull request:
https://github.com/apache/spark/pull/12655#discussion_r60858650
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -403,32 +403,47 @@ class DAGScheduler(
parents.toList
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/12655#issuecomment-214097411
**[Test build #56868 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56868/consoleFull)**
for PR 12655 at commit
GitHub user ueshin opened a pull request:
https://github.com/apache/spark/pull/12655
[SPARK-13902][SCHEDULER] Make DAGScheduler.getAncestorShuffleDependencies()
return in topological order to ensure building ancestor stages first.
## What changes were proposed in this pull request?
Github user ueshin closed the pull request at:
https://github.com/apache/spark/pull/11720
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user ueshin commented on the pull request:
https://github.com/apache/spark/pull/11720#issuecomment-203734197
I'm going to close this in favor of #12060.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user ueshin commented on the pull request:
https://github.com/apache/spark/pull/11720#issuecomment-203350094
@markhamstra
Could you take a look at #12060 please?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/11720#discussion_r57859542
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -286,6 +286,7 @@ class DAGScheduler(
case None =>
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/11720#issuecomment-202722356
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/11720#issuecomment-202722352
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/11720#issuecomment-202721624
**[Test build #54407 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54407/consoleFull)**
for PR 11720 at commit
Github user markhamstra commented on a diff in the pull request:
https://github.com/apache/spark/pull/11720#discussion_r57668558
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -286,6 +286,7 @@ class DAGScheduler(
case None =>
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/11720#issuecomment-202689229
**[Test build #54407 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54407/consoleFull)**
for PR 11720 at commit
Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/11720#discussion_r57667547
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -286,6 +286,7 @@ class DAGScheduler(
case None =>
Github user markhamstra commented on a diff in the pull request:
https://github.com/apache/spark/pull/11720#discussion_r57605206
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -286,6 +286,7 @@ class DAGScheduler(
case None =>
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/11720#issuecomment-202484260
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/11720#issuecomment-202484262
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/11720#issuecomment-202483946
**[Test build #54319 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54319/consoleFull)**
for PR 11720 at commit
Github user markhamstra commented on a diff in the pull request:
https://github.com/apache/spark/pull/11720#discussion_r57586861
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -286,6 +286,7 @@ class DAGScheduler(
case None =>
Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/11720#discussion_r57586226
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -286,6 +286,7 @@ class DAGScheduler(
case None =>
Github user markhamstra commented on the pull request:
https://github.com/apache/spark/pull/11720#issuecomment-202434831
The basic idea is a good one, but in addition to needing to spend some time
sorting out the logic around that `assert`, I'm a little concerned about the
Github user markhamstra commented on a diff in the pull request:
https://github.com/apache/spark/pull/11720#discussion_r57581958
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -286,6 +286,7 @@ class DAGScheduler(
case None =>
Github user markhamstra commented on a diff in the pull request:
https://github.com/apache/spark/pull/11720#discussion_r57581491
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -286,6 +286,7 @@ class DAGScheduler(
case None =>
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/11720#issuecomment-202427366
**[Test build #54319 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54319/consoleFull)**
for PR 11720 at commit
Github user ueshin commented on the pull request:
https://github.com/apache/spark/pull/11720#issuecomment-202427017
Jenkins, retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/11720#issuecomment-202369148
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/11720#issuecomment-202369146
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/11720#issuecomment-202369046
**[Test build #54298 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54298/consoleFull)**
for PR 11720 at commit
Github user maropu commented on the pull request:
https://github.com/apache/spark/pull/11720#issuecomment-202320267
cc: @rxin
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/11720#issuecomment-202298857
**[Test build #54298 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54298/consoleFull)**
for PR 11720 at commit
Github user ueshin commented on the pull request:
https://github.com/apache/spark/pull/11720#issuecomment-202298736
@maropu Thank you for your review.
I modified as you mentioned.
> do we get wrong answers from this kind of incorrect stage graphs?
I don't think
Github user maropu commented on the pull request:
https://github.com/apache/spark/pull/11720#issuecomment-202258270
@ueshin Great work. I'm not 100% sure though, one question; do we get wrong
answers from this kind of incorrect stage graphs? If so, it'd be better to add
tests in
Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/11720#discussion_r57548138
--- Diff:
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ---
@@ -322,6 +322,53 @@ class DAGSchedulerSuite extends SparkFunSuite with
Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/11720#discussion_r57548049
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -403,25 +403,37 @@ class DAGScheduler(
parents.toList
}
Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/11720#discussion_r57547815
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -403,25 +403,37 @@ class DAGScheduler(
parents.toList
}
Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/11720#discussion_r57547560
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -403,25 +403,37 @@ class DAGScheduler(
parents.toList
}
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/11720#issuecomment-197906450
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/11720#issuecomment-197378322
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/11720#issuecomment-197378325
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/11720#issuecomment-197377676
**[Test build #53318 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53318/consoleFull)**
for PR 11720 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/11720#issuecomment-197905769
**[Test build #53425 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53425/consoleFull)**
for PR 11720 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/11720#issuecomment-197906458
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/11720#issuecomment-197320498
**[Test build #53318 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53318/consoleFull)**
for PR 11720 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/11720#issuecomment-197855749
**[Test build #53425 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53425/consoleFull)**
for PR 11720 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/11720#issuecomment-197154174
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/11720#issuecomment-197154175
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/11720#issuecomment-197154079
**[Test build #53265 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53265/consoleFull)**
for PR 11720 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/11720#issuecomment-197127125
**[Test build #53265 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53265/consoleFull)**
for PR 11720 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/11720#issuecomment-196767388
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/11720#issuecomment-196767390
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/11720#issuecomment-196767151
**[Test build #53181 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53181/consoleFull)**
for PR 11720 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/11720#issuecomment-196728116
**[Test build #53181 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53181/consoleFull)**
for PR 11720 at commit
GitHub user ueshin opened a pull request:
https://github.com/apache/spark/pull/11720
[SPARK-13902][SCHEDULER] Make DAGScheduler.getAncestorShuffleDependencies()
return in topological order to ensure building ancestor stages first.
## What changes were proposed in this pull request?
91 matches
Mail list logo