---------- Forwarded message ---------- From: rapelly kartheek <kartheek.m...@gmail.com> Date: Thu, Sep 4, 2014 at 11:49 AM Subject: Re: RDDs To: "Liu, Raymond" <raymond....@intel.com>
Thank you Raymond. I am more clear now. So, if an rdd is replicated over multiple nodes (i.e. say two sets of nodes as it is a collection of chunks), can we run two jobs concurrently and seperately on these two sets of nodes? On Thu, Sep 4, 2014 at 11:38 AM, Liu, Raymond <raymond....@intel.com> wrote: > Actually, a replicated RDD and a parallel job on the same RDD, this two > conception is not related at all. > A replicated RDD just store data on multiple node, it helps with HA and > provide better chance for data locality. It is still one RDD, not two > separate RDD. > While regarding run two jobs on the same RDD, it doesn't matter that the > RDD is replicated or not. You can always do it if you wish to. > > > Best Regards, > Raymond Liu > > -----Original Message----- > From: Kartheek.R [mailto:kartheek.m...@gmail.com] > Sent: Thursday, September 04, 2014 1:24 PM > To: u...@spark.incubator.apache.org > Subject: RE: RDDs > > Thank you Raymond and Tobias. > Yeah, I am very clear about what I was asking. I was talking about > "replicated" rdd only. Now that I've got my understanding about job and > application validated, I wanted to know if we can replicate an rdd and run > two jobs (that need same rdd) of an application in parallel?. > > -Karthk > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/RDDs-tp13343p13416.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional > commands, e-mail: user-h...@spark.apache.org > >