Daniel,
Do you mind sharing the size of your cluster and the production data volumes ? Thanks Soumya > On Jul 7, 2014, at 3:39 PM, Daniel Siegmann <daniel.siegm...@velos.io> wrote: > > From a development perspective, I vastly prefer Spark to MapReduce. The > MapReduce API is very constrained; Spark's API feels much more natural to me. > Testing and local development is also very easy - creating a local Spark > context is trivial and it reads local files. For your unit tests you can just > have them create a local context and execute your flow with some test data. > Even better, you can do ad-hoc work in the Spark shell and if you want that > in your production code it will look exactly the same. > > Unfortunately, the picture isn't so rosy when it gets to production. In my > experience, Spark simply doesn't scale to the volumes that MapReduce will > handle. Not with a Standalone cluster anyway - maybe Mesos or YARN would be > better, but I haven't had the opportunity to try them. I find jobs tend to > just hang forever for no apparent reason on large data sets (but smaller than > what I push through MapReduce). > > I am hopeful the situation will improve - Spark is developing quickly - but > if you have large amounts of data you should proceed with caution. > > Keep in mind there are some frameworks for Hadoop which can hide the ugly > MapReduce with something very similar in form to Spark's API; e.g. Apache > Crunch. So you might consider those as well. > > (Note: the above is with Spark 1.0.0.) > > > >> On Mon, Jul 7, 2014 at 11:07 AM, <santosh.viswanat...@accenture.com> wrote: >> Hello Experts, >> >> >> >> I am doing some comparative study on the below: >> >> >> >> Spark vs Impala >> >> Spark vs MapREduce . Is it worth migrating from existing MR implementation >> to Spark? >> >> >> >> >> >> Please share your thoughts and expertise. >> >> >> >> >> >> Thanks, >> Santosh >> >> >> >> This message is for the designated recipient only and may contain >> privileged, proprietary, or otherwise confidential information. If you have >> received it in error, please notify the sender immediately and delete the >> original. Any other use of the e-mail by you is prohibited. Where allowed by >> local law, electronic communications with Accenture and its affiliates, >> including e-mail and instant messaging (including content), may be scanned >> by our systems for the purposes of information security and assessment of >> internal compliance with Accenture policy. >> ______________________________________________________________________________________ >> >> www.accenture.com > > > > -- > Daniel Siegmann, Software Developer > Velos > Accelerating Machine Learning > > 440 NINTH AVENUE, 11TH FLOOR, NEW YORK, NY 10001 > E: daniel.siegm...@velos.io W: www.velos.io