spark is partitioner aware, so it can exploit a situation where 2 datasets are partitioned the same way (for example by doing a map-side join on them). map-red does not expose this.
On Sun, Jun 28, 2015 at 12:13 PM, YaoPau <jonrgr...@gmail.com> wrote: > I've heard "Spark is not just MapReduce" mentioned during Spark talks, but > it > seems like every method that Spark has is really doing something like (Map > -> Reduce) or (Map -> Map -> Map -> Reduce) etc behind the scenes, with the > performance benefit of keeping RDDs in memory between stages. > > Am I wrong about that? Is Spark doing anything more efficiently than a > series of Maps followed by a Reduce in memory? What methods does Spark > have > that can't easily be mapped (with somewhat similar efficiency) to Map and > Reduce in memory? > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/What-does-Spark-is-not-just-MapReduce-mean-Isn-t-every-Spark-job-a-form-of-MapReduce-tp23518.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >