Your point #1 is a bit misleading.

>> (1) The mappers are not executed in parallel when processing
independently the same RDD.

To clarify, I'd say: In one stage of execution, when pipelining occurs,
mappers are not executed in parallel when processing independently the same
RDD partition.

On Thu, Apr 9, 2015 at 11:19 AM, spark_user_2015 <li...@adobe.com> wrote:

> That was helpful!
>
> The conclusion:
> (1) The mappers are not executed in parallel when processing independently
> the same RDD.
> (2) The best way seems to be (if enough memory is available and an action
> is
> applied to d1 and d2 later on)
>        val d1 = data.map((x,y,z) => (x,y)).cache
>        val d2 = d1.map((x,y) => (y,x))
>      -  This avoids pipelining the "d1" mapper and "d2" mapper when
> computing d2
>
> This is important to write efficient code, toDebugString helps a lot.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Caching-and-Actions-tp22418p22444.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to