Re: Spark or MR, Scala or Java?

Krishna Sankar Sat, 22 Nov 2014 16:54:06 -0800

Adding to already interesting answers:

   - "Is there any case where MR is better than Spark? I don't know what cases
   I should be used Spark by MR. When is MR faster than Spark?"
   - Many. MR would be better (am not saying faster ;o)) for
         - Very large dataset,
         - Multistage map-reduce flows,
         - Complex map-reduce semantics
      - Spark is definitely better for the classic iterative,interactive
      workloads.
      - Spark is very effective for implementing the concepts of in-memory
      datasets & real time analytics
         - Take a look at the Lambda architecture
      - Also checkout how Ooyala is using Spark in multiple layers &
      configurations. They also have MR in many places
      - In our case, we found Spark very effective for ELT - we would have
      used MR earlier
   -  "I know Java, is it worth it to learn Scala for programming to Spark
   or it's okay just with Java?"
   - Java will work fine. Especially when Java 8 becomes the norm, we will
      get back some of the elegance
      - I, personally, like Scala & Python lot better than Java. Scala is a
      lot more elegant, but compilations, IDE integration et al are still clunky
      - One word of caution - stick with one language as much as
      possible-shuffling between Java & Scala is not fun


Cheers & HTH
<k/>

On Sat, Nov 22, 2014 at 8:26 AM, Sean Owen <so...@cloudera.com> wrote:

> MapReduce is simpler and narrower, which also means it is generally
> lighter weight, with less to know and configure, and runs more predictably.
> If you have a job that is truly just a few maps, with maybe one reduce, MR
> will likely be more efficient. Until recently its shuffle has been more
> developed and offers some semantics the Spark shuffle does not.
>
> I suppose it integrates with tools like Oozie, that Spark does not.
>
> I suggest learning enough Scala to use Spark in Scala. The amount you need
> to know is not large.
>
> (Mahout MR based implementations do not run on Spark and will not. They
> have been removed instead.)
> On Nov 22, 2014 3:36 PM, "Guillermo Ortiz" <konstt2...@gmail.com> wrote:
>
>> Hello,
>>
>> I'm a newbie with Spark but I've been working with Hadoop for a while.
>> I have two questions.
>>
>> Is there any case where MR is better than Spark? I don't know what
>> cases I should be used Spark by MR. When is MR faster than Spark?
>>
>> The other question is, I know Java, is it worth it to learn Scala for
>> programming to Spark or it's okay just with Java? I have done a little
>> piece of code with Java because I feel more confident with it,, but I
>> seems that I'm missed something
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>

Re: Spark or MR, Scala or Java?

Reply via email to