Github user reggert commented on the issue:
https://github.com/apache/spark/pull/15899
Strictly speaking, this doesn't just affect pair RDDs. It affects any RDDs
on which a `for` expression involving a filter operation, which includes
explicit `if` clauses as well as pattern ma
Github user reggert commented on the issue:
https://github.com/apache/spark/pull/15899
The (k,v) <- pairRDD expression involves a pattern match, which the
compiler converts into a filter/withFilter call on items that match the pattern.
---
If your project is set up for it, you
Github user reggert commented on the issue:
https://github.com/apache/spark/pull/15899
I don't get why you say that it "doesn't even work in general". Under what
circumstances doesn't it work? I've never run into any problems with it.
The "
Github user reggert commented on the issue:
https://github.com/apache/spark/pull/15899
I disagree strongly. I've used RDDs in for comprehensions for almost 2
years without issue. Being able to "extend for loops" in this way is a
major language feature of Scal
Github user reggert commented on the issue:
https://github.com/apache/spark/pull/15899
The only other weird case I've run into is trying to `flatMap` across
multiple RDDs, e.g., `for (x <- rdd1; y <- rdd2) yield x + y`, but it simply
won't compile because `RDD.
Github user reggert commented on a diff in the pull request:
https://github.com/apache/spark/pull/15899#discussion_r88267555
--- Diff: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala ---
@@ -70,6 +70,7 @@ class RDDSuite extends SparkFunSuite with
SharedSparkContext
Github user reggert commented on the issue:
https://github.com/apache/spark/pull/15899
Using RDD's in `for` comprehensions dates back _at least_ to the examples
given in the old 2010 AMPLab paper "Spark: Cluster Computing with Working
Sets". `for` comprehensions are in
GitHub user reggert opened a pull request:
https://github.com/apache/spark/pull/15899
[SPARK-18466] added withFilter method to RDD
## What changes were proposed in this pull request?
A `withFilter` method has been added to `RDD` as an alias for the `filter`
method. When
Github user reggert commented on the pull request:
https://github.com/apache/spark/pull/9264#issuecomment-164967910
@andrewor14 I included a couple lines to propagate the local properties
(from the thread that created the `ComplexFutureAction`) to each of the spawned
jobs (all but
Github user reggert commented on a diff in the pull request:
https://github.com/apache/spark/pull/9264#discussion_r47446916
--- Diff:
core/src/test/scala/org/apache/spark/rdd/AsyncRDDActionsSuite.scala ---
@@ -197,4 +197,34 @@ class AsyncRDDActionsSuite extends SparkFunSuite with
Github user reggert commented on a diff in the pull request:
https://github.com/apache/spark/pull/9264#discussion_r47322111
--- Diff:
core/src/test/scala/org/apache/spark/rdd/AsyncRDDActionsSuite.scala ---
@@ -197,4 +197,34 @@ class AsyncRDDActionsSuite extends SparkFunSuite with
Github user reggert commented on a diff in the pull request:
https://github.com/apache/spark/pull/9264#discussion_r47117656
--- Diff:
core/src/test/scala/org/apache/spark/rdd/AsyncRDDActionsSuite.scala ---
@@ -197,4 +197,34 @@ class AsyncRDDActionsSuite extends SparkFunSuite with
Github user reggert commented on a diff in the pull request:
https://github.com/apache/spark/pull/9264#discussion_r46909976
--- Diff:
core/src/test/scala/org/apache/spark/rdd/AsyncRDDActionsSuite.scala ---
@@ -197,4 +197,34 @@ class AsyncRDDActionsSuite extends SparkFunSuite with
Github user reggert commented on the pull request:
https://github.com/apache/spark/pull/9264#issuecomment-162326651
All I want for Christmas is a code review. :-)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user reggert commented on the pull request:
https://github.com/apache/spark/pull/9264#issuecomment-161528329
@zsxwing @JoshRosen I just want to make sure that you guys haven't
forgotten about this. I haven't heard anything in a week and a half.
---
If your project
Github user reggert commented on the pull request:
https://github.com/apache/spark/pull/9264#issuecomment-160829564
Any feedback?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user reggert commented on the pull request:
https://github.com/apache/spark/pull/9264#issuecomment-158800472
I've implemented a two-line fix for SPARK-4514.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If
Github user reggert commented on the pull request:
https://github.com/apache/spark/pull/9264#issuecomment-158799901
Correction: This PR only fixes SPARK-4514 as long as the
{{ComplexFutureAction}} only spawns a single job. The job group information is
lost in subsequent jobs
Github user reggert commented on the pull request:
https://github.com/apache/spark/pull/9264#issuecomment-158798727
It appears that this PR may also fix SPARK-4514.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user reggert commented on the pull request:
https://github.com/apache/spark/pull/9264#issuecomment-158783297
SparkQA seems to be happy now.
@JoshRosen What else needs to be done to move forward on this? The current
implementation on master requires a dedicated thread pool
Github user reggert commented on the pull request:
https://github.com/apache/spark/pull/9264#issuecomment-158680579
I've come up with a reusable way to make use of semaphores to control
timing of tasks during unit tests. Please see the `Smuggle` class and let me
know what you
Github user reggert commented on the pull request:
https://github.com/apache/spark/pull/9264#issuecomment-158609528
Can someone re-run the build?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user reggert commented on the pull request:
https://github.com/apache/spark/pull/9264#issuecomment-157972418
Any thoughts on the latest changes (besides the merge)?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user reggert commented on the pull request:
https://github.com/apache/spark/pull/9264#issuecomment-157295993
I most certainly did not add all those classes (other than `JobSubmitter`)!
@SparkQA is confused.
---
If your project is set up for it, you can reply to this email and
Github user reggert commented on the pull request:
https://github.com/apache/spark/pull/9264#issuecomment-157254101
I took a stab at implementing the above refactoring, and it came out
looking pretty nice, so I went ahead and committed it. Please take a look and
let me know if you
Github user reggert commented on the pull request:
https://github.com/apache/spark/pull/9264#issuecomment-156868899
I'm not particularly happy about the (internal) API exposed by
`ComplexFutureAction`. Having callers instantiate the action and then mutate it
by calling the `run
Github user reggert commented on a diff in the pull request:
https://github.com/apache/spark/pull/9264#discussion_r44879232
--- Diff:
core/src/test/scala/org/apache/spark/rdd/AsyncRDDActionsSuite.scala ---
@@ -27,7 +27,7 @@ import org.scalatest.BeforeAndAfterAll
import
Github user reggert commented on the pull request:
https://github.com/apache/spark/pull/9264#issuecomment-156844499
@srowen Okay. I was just worried I was going to get blamed for breaking
something. ;-)
What's left to do here?
---
If your project is set up for it, yo
Github user reggert commented on the pull request:
https://github.com/apache/spark/pull/9264#issuecomment-156769951
Unrelated to this PR (except that it may be partially responsible for the
most recent test failure), there appears to be a race condition here:
```scala
if
Github user reggert commented on the pull request:
https://github.com/apache/spark/pull/9264#issuecomment-156768512
It sure would be nice if Spark's unit tests were consistent. I add
whitespace and suddenly the build fails?
---
If your project is set up for it, you can reply to
Github user reggert commented on a diff in the pull request:
https://github.com/apache/spark/pull/9264#discussion_r44865053
--- Diff: core/src/main/scala/org/apache/spark/FutureAction.scala ---
@@ -177,80 +150,50 @@ class SimpleFutureAction[T] private[spark](jobWaiter:
JobWaiter
Github user reggert commented on a diff in the pull request:
https://github.com/apache/spark/pull/9264#discussion_r44864860
--- Diff:
core/src/test/scala/org/apache/spark/rdd/AsyncRDDActionsSuite.scala ---
@@ -197,4 +197,50 @@ class AsyncRDDActionsSuite extends SparkFunSuite with
Github user reggert commented on a diff in the pull request:
https://github.com/apache/spark/pull/9264#discussion_r44864838
--- Diff: core/src/main/scala/org/apache/spark/FutureAction.scala ---
@@ -177,80 +150,50 @@ class SimpleFutureAction[T] private[spark](jobWaiter:
JobWaiter
Github user reggert commented on a diff in the pull request:
https://github.com/apache/spark/pull/9264#discussion_r44864690
--- Diff:
core/src/test/scala/org/apache/spark/rdd/AsyncRDDActionsSuite.scala ---
@@ -197,4 +197,50 @@ class AsyncRDDActionsSuite extends SparkFunSuite with
Github user reggert commented on a diff in the pull request:
https://github.com/apache/spark/pull/9264#discussion_r44864655
--- Diff: core/src/main/scala/org/apache/spark/FutureAction.scala ---
@@ -116,57 +119,27 @@ class SimpleFutureAction[T] private[spark](jobWaiter:
JobWaiter
Github user reggert commented on a diff in the pull request:
https://github.com/apache/spark/pull/9264#discussion_r44864643
--- Diff:
core/src/test/scala/org/apache/spark/rdd/AsyncRDDActionsSuite.scala ---
@@ -27,7 +27,7 @@ import org.scalatest.BeforeAndAfterAll
import
Github user reggert commented on a diff in the pull request:
https://github.com/apache/spark/pull/9264#discussion_r44864625
--- Diff: core/src/main/scala/org/apache/spark/FutureAction.scala ---
@@ -116,57 +119,27 @@ class SimpleFutureAction[T] private[spark](jobWaiter:
JobWaiter
Github user reggert commented on a diff in the pull request:
https://github.com/apache/spark/pull/9264#discussion_r44864614
--- Diff: core/src/main/scala/org/apache/spark/FutureAction.scala ---
@@ -177,80 +150,50 @@ class SimpleFutureAction[T] private[spark](jobWaiter:
JobWaiter
Github user reggert commented on a diff in the pull request:
https://github.com/apache/spark/pull/9264#discussion_r44864605
--- Diff: core/src/main/scala/org/apache/spark/FutureAction.scala ---
@@ -177,80 +150,50 @@ class SimpleFutureAction[T] private[spark](jobWaiter:
JobWaiter
Github user reggert commented on a diff in the pull request:
https://github.com/apache/spark/pull/9264#discussion_r44864566
--- Diff: core/src/main/scala/org/apache/spark/FutureAction.scala ---
@@ -276,10 +219,11 @@ class ComplexFutureAction[T] extends FutureAction[T
Github user reggert commented on a diff in the pull request:
https://github.com/apache/spark/pull/9264#discussion_r44864560
--- Diff: core/src/main/scala/org/apache/spark/rdd/AsyncRDDActions.scala ---
@@ -66,14 +65,22 @@ class AsyncRDDActions[T: ClassTag](self: RDD[T])
extends
Github user reggert commented on a diff in the pull request:
https://github.com/apache/spark/pull/9264#discussion_r44864546
--- Diff: core/src/main/scala/org/apache/spark/rdd/AsyncRDDActions.scala ---
@@ -95,19 +102,18 @@ class AsyncRDDActions[T: ClassTag](self: RDD[T])
extends
Github user reggert commented on the pull request:
https://github.com/apache/spark/pull/9264#issuecomment-156295760
Is there anything left to be done before merging this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user reggert commented on the pull request:
https://github.com/apache/spark/pull/9264#issuecomment-155995354
Build failed again, in a test that has nothing to do with this PR. Those
tests seem extremely flaky.
---
If your project is set up for it, you can reply to this email
Github user reggert commented on the pull request:
https://github.com/apache/spark/pull/9264#issuecomment-155967768
Well, the tests that randomly fail passed this time. Yay?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user reggert commented on the pull request:
https://github.com/apache/spark/pull/9264#issuecomment-155945709
The Jenkins build has been failing due to ForkErrors being thrown by random
tests that have nothing to do with anything I changed. The error itself has no
message
Github user reggert commented on a diff in the pull request:
https://github.com/apache/spark/pull/9264#discussion_r44603713
--- Diff: core/src/main/scala/org/apache/spark/FutureAction.scala ---
@@ -116,57 +119,26 @@ class SimpleFutureAction[T] private[spark](jobWaiter:
JobWaiter
Github user reggert commented on the pull request:
https://github.com/apache/spark/pull/9264#issuecomment-155894205
@andrewor14 Thanks. I've fixed the issues you pointed out and anything
else I could find. See my question above regarding `resultFunc`, though.
---
If your proje
Github user reggert commented on a diff in the pull request:
https://github.com/apache/spark/pull/9264#discussion_r44577065
--- Diff: core/src/main/scala/org/apache/spark/FutureAction.scala ---
@@ -229,28 +183,15 @@ class ComplexFutureAction[T] extends FutureAction[T
Github user reggert commented on a diff in the pull request:
https://github.com/apache/spark/pull/9264#discussion_r44576577
--- Diff: core/src/main/scala/org/apache/spark/scheduler/JobWaiter.scala ---
@@ -28,17 +32,15 @@ private[spark] class JobWaiter[T](
resultHandler
Github user reggert commented on a diff in the pull request:
https://github.com/apache/spark/pull/9264#discussion_r44576601
--- Diff: core/src/main/scala/org/apache/spark/scheduler/JobWaiter.scala ---
@@ -49,29 +51,18 @@ private[spark] class JobWaiter[T
Github user reggert commented on a diff in the pull request:
https://github.com/apache/spark/pull/9264#discussion_r44576009
--- Diff: core/src/main/scala/org/apache/spark/scheduler/JobWaiter.scala ---
@@ -49,29 +51,14 @@ private[spark] class JobWaiter[T
Github user reggert commented on a diff in the pull request:
https://github.com/apache/spark/pull/9264#discussion_r44574415
--- Diff: core/src/main/scala/org/apache/spark/FutureAction.scala ---
@@ -116,57 +120,26 @@ class SimpleFutureAction[T] private[spark](jobWaiter:
JobWaiter
Github user reggert commented on a diff in the pull request:
https://github.com/apache/spark/pull/9264#discussion_r44573118
--- Diff: core/src/main/scala/org/apache/spark/scheduler/JobWaiter.scala ---
@@ -49,29 +51,14 @@ private[spark] class JobWaiter[T
Github user reggert commented on the pull request:
https://github.com/apache/spark/pull/9264#issuecomment-155877448
It seems like Jenkins is not in a happy place:
```
> git fetch --tags --progress https://github.com/apache/spark.git
+refs/pull/9264/*:refs/remotes/ori
Github user reggert commented on a diff in the pull request:
https://github.com/apache/spark/pull/9264#discussion_r44570490
--- Diff: core/src/main/scala/org/apache/spark/FutureAction.scala ---
@@ -20,13 +20,16 @@ package org.apache.spark
import java.util.Collections
Github user reggert commented on a diff in the pull request:
https://github.com/apache/spark/pull/9264#discussion_r44570514
--- Diff: core/src/main/scala/org/apache/spark/rdd/AsyncRDDActions.scala ---
@@ -95,19 +102,18 @@ class AsyncRDDActions[T: ClassTag](self: RDD[T])
extends
Github user reggert commented on a diff in the pull request:
https://github.com/apache/spark/pull/9264#discussion_r44570266
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -23,6 +23,7 @@ import java.util.concurrent.TimeUnit
import
Github user reggert commented on a diff in the pull request:
https://github.com/apache/spark/pull/9264#discussion_r44570092
--- Diff: core/src/main/scala/org/apache/spark/FutureAction.scala ---
@@ -116,57 +119,26 @@ class SimpleFutureAction[T] private[spark](jobWaiter:
JobWaiter
Github user reggert commented on a diff in the pull request:
https://github.com/apache/spark/pull/9264#discussion_r44568927
--- Diff: core/src/main/scala/org/apache/spark/FutureAction.scala ---
@@ -116,57 +119,26 @@ class SimpleFutureAction[T] private[spark](jobWaiter:
JobWaiter
Github user reggert commented on a diff in the pull request:
https://github.com/apache/spark/pull/9264#discussion_r44568209
--- Diff: core/src/main/scala/org/apache/spark/rdd/AsyncRDDActions.scala ---
@@ -66,14 +65,22 @@ class AsyncRDDActions[T: ClassTag](self: RDD[T])
extends
Github user reggert commented on a diff in the pull request:
https://github.com/apache/spark/pull/9264#discussion_r44564647
--- Diff:
core/src/test/scala/org/apache/spark/rdd/AsyncRDDActionsSuite.scala ---
@@ -197,4 +197,31 @@ class AsyncRDDActionsSuite extends SparkFunSuite with
Github user reggert commented on a diff in the pull request:
https://github.com/apache/spark/pull/9264#discussion_r44564235
--- Diff: core/src/main/scala/org/apache/spark/FutureAction.scala ---
@@ -20,13 +20,16 @@ package org.apache.spark
import java.util.Collections
Github user reggert commented on a diff in the pull request:
https://github.com/apache/spark/pull/9264#discussion_r44536897
--- Diff: core/src/main/scala/org/apache/spark/FutureAction.scala ---
@@ -105,6 +108,7 @@ trait FutureAction[T] extends Future[T] {
* A [[FutureAction
Github user reggert commented on a diff in the pull request:
https://github.com/apache/spark/pull/9264#discussion_r44534066
--- Diff: core/src/main/scala/org/apache/spark/FutureAction.scala ---
@@ -105,6 +108,7 @@ trait FutureAction[T] extends Future[T] {
* A [[FutureAction
Github user reggert commented on the pull request:
https://github.com/apache/spark/pull/9264#issuecomment-155651893
I've added the `@DeveloperApi` annotation to `SimpleFutureAction` and
`ComplexFutureAction`, which should resolve the MiMa errors.
---
If your project is set up f
Github user reggert commented on the pull request:
https://github.com/apache/spark/pull/9264#issuecomment-155642041
Why do we care about binary compatibility for an internal API that is only
used in one place?
In any case, the implicit ExecutionContext being passed to
Github user reggert commented on the pull request:
https://github.com/apache/spark/pull/9264#issuecomment-155308047
"[info] spark-mllib: found 0 potential binary incompatibilities (filtered
64)
java.lang.RuntimeException: spark-core: Binary compatibility check failed!"
Github user reggert commented on the pull request:
https://github.com/apache/spark/pull/9264#issuecomment-155299515
Back up to date with master. All style issues should be resolved.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user reggert commented on a diff in the pull request:
https://github.com/apache/spark/pull/9264#discussion_r44371154
--- Diff:
core/src/test/scala/org/apache/spark/rdd/AsyncRDDActionsSuite.scala ---
@@ -197,4 +197,30 @@ class AsyncRDDActionsSuite extends SparkFunSuite with
Github user reggert commented on the pull request:
https://github.com/apache/spark/pull/9264#issuecomment-155293780
@squito I merged with master 2 days ago, and more conflicts were introduced
on master less than 24 hours later. I'll merge one more time in order to fix
the issue
Github user reggert commented on a diff in the pull request:
https://github.com/apache/spark/pull/9264#discussion_r44370757
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Utils.scala
---
@@ -1,393 +0,0 @@
-/*
- * Licensed to the
Github user reggert commented on the pull request:
https://github.com/apache/spark/pull/9264#issuecomment-154877812
What else needs to be done to get this PR merged?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If
Github user reggert commented on the pull request:
https://github.com/apache/spark/pull/9264#issuecomment-154751770
I've added some unit tests that verify that my changes, do, indeed, stop
callbacks from consuming threads before the jobs have completed.
---
If your project is s
Github user reggert commented on the pull request:
https://github.com/apache/spark/pull/9264#issuecomment-150851994
It looks like our approaches are very similar, though my changes to
`JobWaiter` and `SimpleFutureAction` are a bit smaller, and I also included
fixes for
Github user reggert commented on the pull request:
https://github.com/apache/spark/pull/9264#issuecomment-150851578
I agree that the JIRA issue is a duplicate. Odd that the old one didn't
turn up for it when I searched for existing issues. I've linked the issues in
JIRA
GitHub user reggert opened a pull request:
https://github.com/apache/spark/pull/9264
[SPARK-11296] Modifications to JobWaiter, FutureAction, and AsyncRDDActions
to support non-blocking operation
These changes rework the implementations of SimpleFutureAction,
ComplexFutureAction
77 matches
Mail list logo