Yes Raymond is right. You can always run two jobs on the same cached RDD,
and they can run in parallel (assuming you launch the 2 jobs from two
different threads). However, with one copy of each RDD partition, the tasks
of two jobs will experience some slot contentions. So if you replicate it,
you can eek out a bit more parallelism as there will be less contention.
However it is still not possible to guarantee that there will be no
contention at all as there may just be many time more tasks that slots
available in the cluster, the jobs will contend for slots even if data
locality is ignored.


On Wed, Sep 3, 2014 at 11:08 PM, Liu, Raymond <raymond....@intel.com> wrote:

> Actually, a replicated RDD and a parallel job on the same RDD, this two
> conception is not related at all.
> A replicated RDD just store data on multiple node, it helps with HA and
> provide better chance for data locality. It is still one RDD, not two
> separate RDD.
> While regarding run two jobs on the same RDD, it doesn't matter that the
> RDD is replicated or not. You can always do it if you wish to.
>
>
> Best Regards,
> Raymond Liu
>
> -----Original Message-----
> From: Kartheek.R [mailto:kartheek.m...@gmail.com]
> Sent: Thursday, September 04, 2014 1:24 PM
> To: u...@spark.incubator.apache.org
> Subject: RE: RDDs
>
> Thank you Raymond and Tobias.
> Yeah, I am very clear about what I was asking. I was talking about
> "replicated" rdd only. Now that I've got my understanding about job and
> application validated, I wanted to know if we can replicate an rdd and run
> two jobs (that need same rdd) of an application in parallel?.
>
> -Karthk
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/RDDs-tp13343p13416.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional
> commands, e-mail: user-h...@spark.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to