The difference between this code and the code you showed is that in this
code there's a .reduce() whose result is the size of ONE element in the
RDD, while your code calls .collect() whose result is the size of your
entire dataset.
It would help if you provided a more complete example.
On Tue, F
Here is an example code that is bundled with Spark
for (i <- 1 to ITERATIONS) {
println("On iteration " + i)
val gradient = points.map { p =>
(1 / (1 + exp(-p.y * (w dot p.x))) - 1) * p.y * p.x
}.reduce(_ + _)
w -= gradient
}
As you can see, an action is called
I am not sure !, may be Mark can correct me. You may try the
AsyncRDDFunctions, (check API docs for details.) I am feeling as if, it can
send many tasks and then result can be received Async.
On Tue, Feb 18, 2014 at 1:14 PM, Guillaume Pitel wrote:
> Whatever you want to do, if you really have
Whatever you want to do, if you really
have to do it that way, don't use Spark. And the answer to your
question is : Spark automatically "interleaves" stages that can be
interleaved.
Now, I do not believe that you really want to do that. You
probably
Is there a way I can queue several stages at once?
On Mon, Feb 17, 2014 at 12:08 PM, Mark Hamstra wrote:
> With so little information about what your code is actually doing, what
> you have shared looks likely to be an anti-pattern to me. Doing many
> collect actions is something to be avoided
With so little information about what your code is actually doing, what you
have shared looks likely to be an anti-pattern to me. Doing many collect
actions is something to be avoided if at all possible, since this forces a
lot of network communication to materialize the results back within the
dr
I have a spark application that has the below structure:
while(...) { // 10-100k iterations
rdd.map(...).collect
}
Basically, I have an RDD and I need to query it multiple times.
Now when I run this, for each iteration, Spark creates a new stage (each
stage having multiple tasks). What I find