For what it is worth, our team here at
MediaCrossing<http://mediacrossing.com> has
been using the Spark/Mesos combination since last summer with much success
(low operations overhead, high developer performance).

IMO, Hadoop is overcomplicated from both a development and operations
perspective so I am looking to lower our dependencies on it, not increase
them.  Our stack currently includes:


   - Spark 0.9.1
   - Mesos 0.17
   - Chronos
   - HDFS (CDH 5.0-mr1)
   - Flume 1.4.0
   - ZooKeeper
   - Cassandra 2.0 (key-value store alternative to HBase)
   - Storm 0.9 (we prefer today to Spark Streaming)

We've used Shark in the past as well, but since most of us prefer the Spark
Shell we have not been maintaining it.

Using Mesos to run Spark allows for us to optimize our available resources
(CPU + RAM currently ) between Spark, Chronos and a number of other
services.  I see YARN as being heavily focused on MR2, but the reality is
we are using Spark in large part because writing MapReduce jobs is verbose,
hard to maintain and not performant (against Spark).  We have the advantage
of not having any real legacy Map/Reduce jobs to maintain, so that
consideration does not come into play.

Finally, I am a believer that for the long term direction of our company,
the Berkeley stack <https://amplab.cs.berkeley.edu/software/> will serve us
best.  Leveraging Mesos and Spark from the onset paves the way for this.


On Sun, May 11, 2014 at 1:28 PM, Paco Nathan <cet...@gmail.com> wrote:

> That's FUD. Tracking the Mesos and Spark use cases, there are very large
> production deployments of these together. Some are rather private but
> others are being surfaced. IMHO, one of the most amazing case studies is
> from Christina Delimitrou http://youtu.be/YpmElyi94AA
>
> For a tutorial, use the following but upgrade it to latest production for
> Spark. There was a related O'Reilly webcast and Strata tutorial as well:
> http://mesosphere.io/learn/run-spark-on-mesos/
>
> FWIW, I teach "Intro to Spark" with sections on CM4, YARN, Mesos, etc.
> Based on lots of student experiences, Mesos is clearly the shortest path to
> deploying a Spark cluster if you want to leverage the robustness,
> multi-tenancy for mixed workloads, less ops overhead, etc., that show up
> repeatedly in the use case analyses.
>
> My opinion only and not that of any of my clients: "Don't believe the FUD
> from YHOO unless you really want to be stuck in 2009."
>
>
> On Wed, May 7, 2014 at 8:30 AM, deric <barton.to...@gmail.com> wrote:
>
>> I'm also using right now SPARK_EXECUTOR_URI, though I would prefer
>> distributing Spark as a binary package.
>>
>> For running examples with `./bin/run-example ...` it works fine, however
>> tasks from spark-shell are getting lost.
>>
>> Error: Could not find or load main class
>> org.apache.spark.executor.MesosExecutorBackend
>>
>> which looks more like problem with sbin/spark-executor and missing paths
>> to
>> jar. Anyone encountered this error before?
>>
>> I guess Yahoo invested quite a lot of effort into YARN and Spark
>> integration
>> (moreover when Mahout is migrating to Spark there's much more interest in
>> Hadoop and Spark integration). If there would be some "Mesos company"
>> working on Spark - Mesos integration it could be at least on the same
>> level.
>>
>> I don't see any other reason why would be YARN better than Mesos,
>> personally
>> I like the latter, however I haven't checked YARN for a while, maybe
>> they've
>> made a significant progress. I think Mesos is more universal and flexible
>> than YARN.
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/is-Mesos-falling-out-of-favor-tp5444p5481.html
>>
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>
>

Reply via email to