Fwd: RDDs

rapelly kartheek Wed, 03 Sep 2014 23:25:57 -0700

---------- Forwarded message ----------
From: rapelly kartheek <kartheek.m...@gmail.com>
Date: Thu, Sep 4, 2014 at 11:49 AM
Subject: Re: RDDs
To: "Liu, Raymond" <raymond....@intel.com>



Thank you Raymond.
I am more clear now. So, if an rdd is replicated over multiple nodes (i.e.
say two sets of nodes as it is a collection of chunks), can we run two jobs
concurrently and seperately on these two sets of nodes?


On Thu, Sep 4, 2014 at 11:38 AM, Liu, Raymond <raymond....@intel.com> wrote:

> Actually, a replicated RDD and a parallel job on the same RDD, this two
> conception is not related at all.
> A replicated RDD just store data on multiple node, it helps with HA and
> provide better chance for data locality. It is still one RDD, not two
> separate RDD.
> While regarding run two jobs on the same RDD, it doesn't matter that the
> RDD is replicated or not. You can always do it if you wish to.
>
>
> Best Regards,
> Raymond Liu
>
> -----Original Message-----
> From: Kartheek.R [mailto:kartheek.m...@gmail.com]
> Sent: Thursday, September 04, 2014 1:24 PM
> To: u...@spark.incubator.apache.org
> Subject: RE: RDDs
>
> Thank you Raymond and Tobias.
> Yeah, I am very clear about what I was asking. I was talking about
> "replicated" rdd only. Now that I've got my understanding about job and
> application validated, I wanted to know if we can replicate an rdd and run
> two jobs (that need same rdd) of an application in parallel?.
>
> -Karthk
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/RDDs-tp13343p13416.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional
> commands, e-mail: user-h...@spark.apache.org
>
>

Fwd: RDDs

Reply via email to