Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/1381
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enab
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-75493553
LGTM, so I'm going to merge this into `master` (1.4.0). Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-75447631
[Test build #27834 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27834/consoleFull)
for PR 1381 at commit
[`e30ade5`](https://gith
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-75447636
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-75443625
[Test build #27834 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27834/consoleFull)
for PR 1381 at commit
[`e30ade5`](https://githu
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-75443581
Looks good to me and I assume it's OK with Josh since it's basically the
code he suggested. I'll wait a short while for more comments before merging, if
Josh doesn't first
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-75443558
ok to test
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabl
Github user aaronjosephs commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-75443472
@srowen pushed a test with data in the middle of the partitions, retest
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-75434473
Back to @JoshRosen for a final look; it looks good to me eyes. How about
another test for a case where the range begins and ends in the middle of a
partition? right now it
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-75398321
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-75398318
[Test build #27827 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27827/consoleFull)
for PR 1381 at commit
[`cac337c`](https://gith
Github user aaronjosephs commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-75394839
@srowen re-pushed because I accidentally deleted some of the docs for
repartition and shuffle within partitions, don't know if you want to retest
---
If your projec
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-75394744
[Test build #27827 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27827/consoleFull)
for PR 1381 at commit
[`cac337c`](https://githu
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-75392593
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-75392589
[Test build #27824 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27824/consoleFull)
for PR 1381 at commit
[`7618c1d`](https://gith
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-75388569
[Test build #27824 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27824/consoleFull)
for PR 1381 at commit
[`7618c1d`](https://githu
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-75388519
Jenkins, retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have t
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-75386833
[Test build #27819 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27819/consoleFull)
for PR 1381 at commit
[`7618c1d`](https://gith
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-75386838
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-75384018
[Test build #27819 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27819/consoleFull)
for PR 1381 at commit
[`7618c1d`](https://githu
Github user andrewor14 commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-75383783
ok to test
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user aaronjosephs commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-75383175
@srowen OK pushing with Josh's suggested changes
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If y
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-75371858
@aaronjosephs OK although the patch now needs a rebase. Can you bring it up
to date and we can see what Jenkins thinks?
---
If your project is set up for it, you can repl
Github user aaronjosephs closed the pull request at:
https://github.com/apache/spark/pull/1381
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature i
Github user aaronjosephs commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-75180291
@andrewor14 at some point @JoshRosen had said he had finished and would
commit the next day so that is probably the solution we should go with
---
If your project i
GitHub user aaronjosephs reopened a pull request:
https://github.com/apache/spark/pull/1381
[SPARK-911] allow efficient queries for a range if RDD is partitioned wi...
...th RangePartitioner
You can merge this pull request into a Git repository by running:
$ git pull https://gi
Github user andrewor14 commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-75149789
@aaronjosephs @JoshRosen what is the status of this PR? Is it blocking on
more reviews? Although it's old it doesn't seem outdated since this touches a
fairly isolated
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-64317259
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your pro
Github user aaronjosephs commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-55633411
@JoshRosen totally agree many functions would have to be changed to
indicate they preserve ordering. Also this would seem to be past the scope of
the original ticket
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-55626881
@aaronjosephs The binary search is a good idea, although I think there are
a few subtleties involved in getting it to work generally. Imagine that I call
sortByKey() o
Github user aaronjosephs commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-55604431
@JoshRosen this isn't necessarily specified on the ticket but it's related.
Since most of the time something will be range partitioned because you called
sortByKey o
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-55509267
Unless anyone has objection / review feedback, I'd like to commit my
updated version of this PR. I'll do it tomorrow to give folks a chance to
weigh in.
---
If your
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-55509179
Personal style preference, but I think this is a slightly clearer way to
express the calculation of the partition indices:
```diff
diff --git
a/core/src/ma
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-55509076
I came up with a more compact way to implement this, which doesn't need
casts and correctly preserves partitioning:
```diff
diff --git
a/core/src/main/sca
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/1381#discussion_r17514918
--- Diff:
core/src/main/scala/org/apache/spark/rdd/OrderedRDDFunctions.scala ---
@@ -67,4 +67,32 @@ class OrderedRDDFunctions[K : Ordering : ClassTag,
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/1381#discussion_r17514904
--- Diff:
core/src/main/scala/org/apache/spark/rdd/OrderedRDDFunctions.scala ---
@@ -67,4 +67,32 @@ class OrderedRDDFunctions[K : Ordering : ClassTag,
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-54700336
[QA tests have
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19882/consoleFull)
for PR 1381 at commit
[`681c26e`](https://github.com/a
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-54698941
[QA tests have
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19882/consoleFull)
for PR 1381 at commit
[`681c26e`](https://github.com/ap
Github user andrewor14 commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-54697615
ok to test
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
e
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-54694603
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project d
Github user aaronjosephs commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-53919804
@JoshRosen everything handled except if you still think that cast should
not be necessary, thanks for the advice on simplifying the partitioner handling
logic. Also
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-53914414
Place it below the other tests that you've added; I think it's clearer to
keep all of the tests for `filterByRange` groped together in the same file.
---
If your proje
Github user aaronjosephs commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-53914222
@JoshRosen where would be a good place to put that test, since it wouldn't
technically involve any sorting
---
If your project is set up for it, you can reply to th
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-53913816
Maybe add a test for the case where there's no RangePartitioner?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/1381#discussion_r16913368
--- Diff:
core/src/main/scala/org/apache/spark/rdd/OrderedRDDFunctions.scala ---
@@ -67,4 +67,41 @@ class OrderedRDDFunctions[K : Ordering : ClassTag,
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/1381#discussion_r16913312
--- Diff:
core/src/main/scala/org/apache/spark/rdd/OrderedRDDFunctions.scala ---
@@ -67,4 +67,41 @@ class OrderedRDDFunctions[K : Ordering : ClassTag,
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/1381#discussion_r16913280
--- Diff:
core/src/main/scala/org/apache/spark/rdd/OrderedRDDFunctions.scala ---
@@ -67,4 +67,41 @@ class OrderedRDDFunctions[K : Ordering : ClassTag,
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-53910513
[QA tests have
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19484/consoleFull)
for PR 1381 at commit
[`7e6df04`](https://github.com/a
Github user aaronjosephs commented on a diff in the pull request:
https://github.com/apache/spark/pull/1381#discussion_r16911129
--- Diff:
core/src/main/scala/org/apache/spark/rdd/OrderedRDDFunctions.scala ---
@@ -67,4 +67,39 @@ class OrderedRDDFunctions[K : Ordering : ClassTag,
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/1381#discussion_r16909294
--- Diff:
core/src/main/scala/org/apache/spark/rdd/OrderedRDDFunctions.scala ---
@@ -67,4 +67,39 @@ class OrderedRDDFunctions[K : Ordering : ClassTag,
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/1381#discussion_r16909145
--- Diff:
core/src/main/scala/org/apache/spark/rdd/OrderedRDDFunctions.scala ---
@@ -67,4 +67,39 @@ class OrderedRDDFunctions[K : Ordering : ClassTag,
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/1381#discussion_r16909075
--- Diff:
core/src/main/scala/org/apache/spark/rdd/OrderedRDDFunctions.scala ---
@@ -67,4 +67,39 @@ class OrderedRDDFunctions[K : Ordering : ClassTag,
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/1381#discussion_r16909008
--- Diff:
core/src/main/scala/org/apache/spark/rdd/OrderedRDDFunctions.scala ---
@@ -67,4 +67,39 @@ class OrderedRDDFunctions[K : Ordering : ClassTag,
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-53903741
[QA tests have
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19484/consoleFull)
for PR 1381 at commit
[`7e6df04`](https://github.com/ap
Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/1381#discussion_r16908892
--- Diff:
core/src/main/scala/org/apache/spark/rdd/OrderedRDDFunctions.scala ---
@@ -67,4 +67,39 @@ class OrderedRDDFunctions[K : Ordering : ClassTag,
Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-53903047
Jenkins, retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not ha
Github user aaronjosephs commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-53425210
After taking a look at this again I realized I should actually be using
PartitionPruningRDD to avoid launching tasks on bad partitions
---
If your project is set up
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-53358932
[QA tests have
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19164/consoleFull)
for PR 1381 at commit
[`5f78d5c`](https://github.com/a
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-53353374
[QA tests have
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19164/consoleFull)
for PR 1381 at commit
[`5f78d5c`](https://github.com/ap
Github user andrewor14 commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-53353149
No problem. Jenkins test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project do
Github user andrewor14 commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-53353129
test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this fea
Github user aaronjosephs commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-53352324
@andrewor14 sorry about that, else statement was missing braces
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHu
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-53340586
[QA tests have
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19154/consoleFull)
for PR 1381 at commit
[`e12b51b`](https://github.com/a
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-53340495
[QA tests have
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19154/consoleFull)
for PR 1381 at commit
[`e12b51b`](https://github.com/ap
Github user andrewor14 commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-53340269
test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this fea
Github user aaronjosephs commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-48824933
Since the RDD is sorted multiple range queries could be even more efficient
if it was `glommed` first and then binary search was used. Looking for some
input on this
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/1381#issuecomment-48797181
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your pro
GitHub user aaronjosephs opened a pull request:
https://github.com/apache/spark/pull/1381
[SPARK-911] allow efficient queries for a range if RDD is partitioned wi...
...th RangePartitioner
You can merge this pull request into a Git repository by running:
$ git pull https://gith
68 matches
Mail list logo