Say I have a main method with the following pseudo-code (to be run on a spark standalone cluster): main(args) { RDD rdd rdd1 = rdd.map(...) // some other statements not using RDD rdd2 = rdd.filter(...) }
When executed, will each of the two statements involving RDDs (map and filter) be individually partitioned and distributed on available cluster nodes? And any statements not involving RDDs (or data frames) will typically be executed on the driver? Is that how spark take advantage of the cluster? -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org