Re: Interleaving stages

2014-02-18 Thread Evan R. Sparks
The difference between this code and the code you showed is that in this code there's a .reduce() whose result is the size of ONE element in the RDD, while your code calls .collect() whose result is the size of your entire dataset. It would help if you provided a more complete example. On Tue, F

Re: Interleaving stages

2014-02-18 Thread David Thomas
Here is an example code that is bundled with Spark for (i <- 1 to ITERATIONS) { println("On iteration " + i) val gradient = points.map { p => (1 / (1 + exp(-p.y * (w dot p.x))) - 1) * p.y * p.x }.reduce(_ + _) w -= gradient } As you can see, an action is called

Re: Interleaving stages

2014-02-17 Thread Prashant Sharma
I am not sure !, may be Mark can correct me. You may try the AsyncRDDFunctions, (check API docs for details.) I am feeling as if, it can send many tasks and then result can be received Async. On Tue, Feb 18, 2014 at 1:14 PM, Guillaume Pitel wrote: > Whatever you want to do, if you really have

Re: Interleaving stages

2014-02-17 Thread Guillaume Pitel
Whatever you want to do, if you really have to do it that way, don't use Spark. And the answer to your question is : Spark automatically "interleaves" stages that can be interleaved. Now, I do not believe that you really want to do that. You probably

Re: Interleaving stages

2014-02-17 Thread David Thomas
Is there a way I can queue several stages at once? On Mon, Feb 17, 2014 at 12:08 PM, Mark Hamstra wrote: > With so little information about what your code is actually doing, what > you have shared looks likely to be an anti-pattern to me. Doing many > collect actions is something to be avoided

Re: Interleaving stages

2014-02-17 Thread Mark Hamstra
With so little information about what your code is actually doing, what you have shared looks likely to be an anti-pattern to me. Doing many collect actions is something to be avoided if at all possible, since this forces a lot of network communication to materialize the results back within the dr

Interleaving stages

2014-02-17 Thread David Thomas
I have a spark application that has the below structure: while(...) { // 10-100k iterations rdd.map(...).collect } Basically, I have an RDD and I need to query it multiple times. Now when I run this, for each iteration, Spark creates a new stage (each stage having multiple tasks). What I find