>From the OP: (1) val lines = Import full dataset using sc.textFile (2) val ABonly = Filter out all rows from "lines" that are not of type A or B (3) val processA = Process only the A rows from ABonly (4) val processB = Process only the B rows from ABonly
I assume that 3 and 4 are actions, or else nothing happens here at all. When 3 is invoked, it will compute 1, then 2, then 3. 4 will happen after 3, and may even cause 1 and 2 to happen again if nothing is persisted. You can invoke 3 and 4 in parallel on the driver if you like. That's fine. But actions are blocking in the driver. On Mon, Jan 19, 2015 at 8:21 AM, davidkl <davidkl...@hotmail.com> wrote: > Hi Jon, I am looking for an answer for a similar question in the doc now, so > far no clue. > > I would need to know what is spark behaviour in a situation like the example > you provided, but taking into account also that there are multiple > partitions/workers. > > I could imagine it's possible that different spark workers are not > synchronized in terms of waiting for each other to progress to the next > step/stage for the partitions of data they get assigned, while I believe in > streaming they would wait for the current batch to complete before they > start working on a new one. > > In the code I am working on, I need to make sure a particular step is > completed (in all workers, for all partitions) before next transformation is > applied. > > Would be great if someone could clarify or point to these issues in the doc! > :-) > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Does-Spark-automatically-run-different-stages-concurrently-when-possible-tp21075p21227.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org