Adding to already interesting answers: - "Is there any case where MR is better than Spark? I don't know what cases I should be used Spark by MR. When is MR faster than Spark?" - Many. MR would be better (am not saying faster ;o)) for - Very large dataset, - Multistage map-reduce flows, - Complex map-reduce semantics - Spark is definitely better for the classic iterative,interactive workloads. - Spark is very effective for implementing the concepts of in-memory datasets & real time analytics - Take a look at the Lambda architecture - Also checkout how Ooyala is using Spark in multiple layers & configurations. They also have MR in many places - In our case, we found Spark very effective for ELT - we would have used MR earlier - "I know Java, is it worth it to learn Scala for programming to Spark or it's okay just with Java?" - Java will work fine. Especially when Java 8 becomes the norm, we will get back some of the elegance - I, personally, like Scala & Python lot better than Java. Scala is a lot more elegant, but compilations, IDE integration et al are still clunky - One word of caution - stick with one language as much as possible-shuffling between Java & Scala is not fun
Cheers & HTH <k/> On Sat, Nov 22, 2014 at 8:26 AM, Sean Owen <so...@cloudera.com> wrote: > MapReduce is simpler and narrower, which also means it is generally > lighter weight, with less to know and configure, and runs more predictably. > If you have a job that is truly just a few maps, with maybe one reduce, MR > will likely be more efficient. Until recently its shuffle has been more > developed and offers some semantics the Spark shuffle does not. > > I suppose it integrates with tools like Oozie, that Spark does not. > > I suggest learning enough Scala to use Spark in Scala. The amount you need > to know is not large. > > (Mahout MR based implementations do not run on Spark and will not. They > have been removed instead.) > On Nov 22, 2014 3:36 PM, "Guillermo Ortiz" <konstt2...@gmail.com> wrote: > >> Hello, >> >> I'm a newbie with Spark but I've been working with Hadoop for a while. >> I have two questions. >> >> Is there any case where MR is better than Spark? I don't know what >> cases I should be used Spark by MR. When is MR faster than Spark? >> >> The other question is, I know Java, is it worth it to learn Scala for >> programming to Spark or it's okay just with Java? I have done a little >> piece of code with Java because I feel more confident with it,, but I >> seems that I'm missed something >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >>