Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/3079
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enab
Github user marmbrus commented on the pull request:
https://github.com/apache/spark/pull/3079#issuecomment-67382250
Do please reopen though once you having something that is passing tests :)
---
If your project is set up for it, you can reply to this email and have your
reply appear o
Github user marmbrus commented on the pull request:
https://github.com/apache/spark/pull/3079#issuecomment-67382177
hi @erikerlandson, thanks for working on this. It would be great to have a
solution to this long running problem. Since it looks like there is still some
work to be do
Github user erikerlandson commented on the pull request:
https://github.com/apache/spark/pull/3079#issuecomment-63881800
For reference, this other issue has some overlap:
https://issues.apache.org/jira/browse/SPARK-4514
---
If your project is set up for it, you can reply to this
Github user squito commented on a diff in the pull request:
https://github.com/apache/spark/pull/3079#discussion_r20062337
--- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala ---
@@ -113,8 +117,12 @@ class RangePartitioner[K : Ordering : ClassTag, V](
private var
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3079#issuecomment-61719969
[Test build #22892 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22892/consoleFull)
for PR 3079 at commit
[`2183325`](https://gith
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/3079#issuecomment-61719975
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3079#issuecomment-61704937
[Test build #22892 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22892/consoleFull)
for PR 3079 at commit
[`2183325`](https://githu
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3079#issuecomment-61675446
[Test build #22880 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22880/consoleFull)
for PR 3079 at commit
[`0fc30fe`](https://gith
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/3079#issuecomment-61675448
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3079#issuecomment-61675397
[Test build #22880 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22880/consoleFull)
for PR 3079 at commit
[`0fc30fe`](https://githu
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3079#issuecomment-61565392
[Test build #22828 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22828/consoleFull)
for PR 3079 at commit
[`019ac27`](https://gith
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/3079#issuecomment-61565401
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22
Github user marmbrus commented on the pull request:
https://github.com/apache/spark/pull/3079#issuecomment-61556754
@erikerlandson I think you also need -Phive for the tests to run. It is
possible some other things changed (or even that that test case changed with
the upgrade to hive
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/3079#issuecomment-61556278
[Test build #22828 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22828/consoleFull)
for PR 3079 at commit
[`019ac27`](https://githu
Github user erikerlandson commented on the pull request:
https://github.com/apache/spark/pull/3079#issuecomment-61556289
@marmbrus, @scwf, FWIW, the `correlationoptimizer14` test appears to be
working for me. I ran it using: `env _RUN_SQL_TESTS=true _SQL_TESTS_ONLY=true
./dev/run-tes
Github user erikerlandson commented on the pull request:
https://github.com/apache/spark/pull/3079#issuecomment-61555496
Reboot of #1689
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this f
GitHub user erikerlandson opened a pull request:
https://github.com/apache/spark/pull/3079
[SPARK-1021] Defer the data-driven computation of partition bounds in so...
...rtByKey() until evaluation.
You can merge this pull request into a Git repository by running:
$ git pull htt
Github user erikerlandson commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-61508261
@marmbrus, FWIW, the `correlationoptimizer14` test appears to be working
for me. I ran it using: `env _RUN_SQL_TESTS=true _SQL_TESTS_ONLY=true
./dev/run-tests > ~
Github user erikerlandson commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-57110142
@rxin @marmbrus I will check it out
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-57108427
I reverted this commit. @erikerlandson mind taking a look at this problem?
---
If your project is set up for it, you can reply to this email and have your
reply appear on G
Github user marmbrus commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-57106886
Since this PR was merged the correlationoptimizer14 test has been hanging.
We might want to consider rolling back. You can reproduce the problem as
follows: `sbt -Dspa
Github user markhamstra commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-57043930
Have either of you thought about how to coordinate this with Josh's work on
SPARK-3626? https://github.com/apache/spark/pull/2482
---
If your project is set up for i
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-57043862
BTW one thing that would be great to add is a test that makes sure we don't
block the main dag scheduler thread. The reason I think we don't block is that
we call rdd.partit
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/1689
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enab
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-57043822
@erikerlandson i'm going to merge this first. Maybe we can do the cleanup
later.
---
If your project is set up for it, you can reply to this email and have your
reply appea
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/1689#discussion_r18122214
--- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala ---
@@ -113,8 +113,12 @@ class RangePartitioner[K : Ordering : ClassTag, V](
private var o
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/1689#discussion_r18122212
--- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala ---
@@ -113,8 +113,12 @@ class RangePartitioner[K : Ordering : ClassTag, V](
private var o
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/1689#discussion_r18122197
--- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala ---
@@ -222,7 +228,8 @@ class RangePartitioner[K : Ordering : ClassTag, V](
}
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-57043705
Actually I looked at it again. I don't think it would block the scheduler
because we compute partitions outside the scheduler thread. This approach looks
good to me!
---
Github user erikerlandson commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-55805772
So far the best idea I have for (2) is to set some kind of time-out on the
evaluation. The bound computation uses subsampling that will (when all goes
well) cap t
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-55797086
Yea I don't think we need to fully solve 3 here.
My main concern with these set of changes is 2, since a single badly
behaved RDD can potentially block the (unfortun
Github user erikerlandson commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-55628401
Or, maybe just look into playing the same game with the cogrouped RDDs that
I did with sortByKey. Don't get into invoking `defaultPartitioner` until
somebody asks
Github user erikerlandson commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-55627362
Hi @rxin,
1) SimpleFutureAction is still referred to in submitJob method, but that
doesn't appear to be invoked anywhere. I was reluctant to get rid of it,
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-55464403
[QA tests have
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20236/consoleFull)
for PR 1689 at commit
[`50b6da6`](https://github.com/a
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-55458226
@erikerlandson thanks for looking at this.
A few questions:
1. After this pull request, does anything still use SimpleFutureAction?
2. If I understand t
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-55457077
[QA tests have
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20236/consoleFull)
for PR 1689 at commit
[`50b6da6`](https://github.com/ap
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-55456438
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-54694535
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project d
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-52400243
[QA tests have
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18675/consoleFull)
for PR 1689 at commit
[`f3448e4`](https://github.com/a
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-52397817
[QA tests have
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18675/consoleFull)
for PR 1689 at commit
[`f3448e4`](https://github.com/ap
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-52342401
[QA tests have
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18615/consoleFull)
for PR 1689 at commit
[`09f0637`](https://github.com/a
Github user markhamstra commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-52339006
Excellent! I'll try to find some time to review this soon.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user erikerlandson commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-52336202
Latest push updates RangePartition sampling job to be async, and updates
the async action functions so that they will properly enclose the sampling job
induced by c
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-52336221
[QA tests have
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18615/consoleFull)
for PR 1689 at commit
[`09f0637`](https://github.com/ap
Github user erikerlandson commented on a diff in the pull request:
https://github.com/apache/spark/pull/1689#discussion_r15931660
--- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala ---
@@ -222,7 +228,8 @@ class RangePartitioner[K : Ordering : ClassTag, V](
}
Github user erikerlandson commented on a diff in the pull request:
https://github.com/apache/spark/pull/1689#discussion_r15931609
--- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala ---
@@ -113,8 +113,12 @@ class RangePartitioner[K : Ordering : ClassTag, V](
priv
Github user markhamstra commented on a diff in the pull request:
https://github.com/apache/spark/pull/1689#discussion_r15920203
--- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala ---
@@ -113,8 +113,12 @@ class RangePartitioner[K : Ordering : ClassTag, V](
privat
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/1689#discussion_r15919599
--- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala ---
@@ -113,8 +113,12 @@ class RangePartitioner[K : Ordering : ClassTag, V](
private var o
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/1689#discussion_r15919352
--- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala ---
@@ -222,7 +228,8 @@ class RangePartitioner[K : Ordering : ClassTag, V](
}
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-51424177
QA results for PR 1689:- This patch PASSES unit tests.- This patch
merges cleanly- This patch adds no public classesFor more
information see test
ouptut:https://amplab.c
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-51421389
QA tests have started for PR 1689. This patch merges cleanly. View
progress:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18089/consoleFull
---
If
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/1689#discussion_r15900503
--- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala ---
@@ -113,8 +113,13 @@ class RangePartitioner[K : Ordering : ClassTag, V](
private var o
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-50829158
QA results for PR 1689:- This patch PASSES unit tests.- This patch
merges cleanly- This patch adds no public classesFor more
information see test
ouptut:https://amplab.c
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-50824621
QA tests have started for PR 1689. This patch merges cleanly. View
progress:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17611/consoleFull
---
If
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-50824343
Jenkins, this is ok to test.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not ha
GitHub user erikerlandson opened a pull request:
https://github.com/apache/spark/pull/1689
[SPARK-1021] Defer the data-driven computation of partition bounds in so...
...rtByKey() until evaluation.
You can merge this pull request into a Git repository by running:
$ git pull htt
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/1689#issuecomment-50765803
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your pro
58 matches
Mail list logo