Your point #1 is a bit misleading. >> (1) The mappers are not executed in parallel when processing independently the same RDD.
To clarify, I'd say: In one stage of execution, when pipelining occurs, mappers are not executed in parallel when processing independently the same RDD partition. On Thu, Apr 9, 2015 at 11:19 AM, spark_user_2015 <li...@adobe.com> wrote: > That was helpful! > > The conclusion: > (1) The mappers are not executed in parallel when processing independently > the same RDD. > (2) The best way seems to be (if enough memory is available and an action > is > applied to d1 and d2 later on) > val d1 = data.map((x,y,z) => (x,y)).cache > val d2 = d1.map((x,y) => (y,x)) > - This avoids pipelining the "d1" mapper and "d2" mapper when > computing d2 > > This is important to write efficient code, toDebugString helps a lot. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Caching-and-Actions-tp22418p22444.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >