P.S. Last but not least we use sbt-assembly to build fat JAR's and build
dist-style TAR.GZ packages with launch scripts, JAR's and everything needed
to run a Job.  These are automatically built from source by our Jenkins and
stored in HDFS.  Our Chronos/Marathon jobs fetch the latest release TAR.GZ
direct from HDFS, unpack it and launch the appropriate script.

Makes for a much cleaner development / testing / deployment to package
everything required in one go instead of relying on cluster specific
classpath additions or any add-jars functionality.


On 19 June 2014 22:53, Michael Cutler <mich...@tumra.com> wrote:

> When you start seriously using Spark in production there are basically two
> things everyone eventually needs:
>
>    1. Scheduled Jobs - recurring hourly/daily/weekly jobs.
>    2. Always-On Jobs - that require monitoring, restarting etc.
>
> There are lots of ways to implement these requirements, everything from
> crontab through to workflow managers like Oozie.
>
> We opted for the following stack:
>
>    - Apache Mesos <http://mesosphere.io/> (mesosphere.io distribution)
>
>
>    - Marathon <https://github.com/mesosphere/marathon> - init/control
>    system for starting, stopping, and maintaining always-on applications.
>
>
>    - Chronos <http://airbnb.github.io/chronos/> - general-purpose
>    scheduler for Mesos, supports job dependency graphs.
>
>
>    - ** Spark Job Server <https://github.com/ooyala/spark-jobserver> -
>    primarily for it's ability to reuse shared contexts with multiple jobs
>
> The majority of our jobs are periodic (batch) jobs run through
> spark-sumit, and we have several always-on Spark Streaming jobs (also run
> through spark-submit).
>
> We always use "client mode" with spark-submit because the Mesos cluster
> has direct connectivity to the Spark cluster and it means all the Spark
> stdout/stderr is externalised into Mesos logs which helps diagnosing
> problems.
>
> I thoroughly recommend you explore using Mesos/Marathon/Chronos to run
> Spark and manage your Jobs, the Mesosphere tutorials are awesome and you
> can be up and running in literally minutes.  The Web UI's to both make it
> easy to get started without talking to REST API's etc.
>
> Best,
>
> Michael
>
>
>
>
> On 19 June 2014 19:44, Evan R. Sparks <evan.spa...@gmail.com> wrote:
>
>> I use SBT, create an assembly, and then add the assembly jars when I
>> create my spark context. The main executor I run with something like "java
>> -cp ... MyDriver".
>>
>> That said - as of spark 1.0 the preferred way to run spark applications
>> is via spark-submit -
>> http://spark.apache.org/docs/latest/submitting-applications.html
>>
>>
>> On Thu, Jun 19, 2014 at 11:36 AM, ldmtwo <ldm...@gmail.com> wrote:
>>
>>> I want to ask this, not because I can't read endless documentation and
>>> several tutorials, but because there seems to be many ways of doing
>>> things
>>> and I keep having issues. How do you run /your /spark app?
>>>
>>> I had it working when I was only using yarn+hadoop1 (Cloudera), then I
>>> had
>>> to get Spark and Shark working and ended upgrading everything and dropped
>>> CDH support. Anyways, this is what I used with master=yarn-client and
>>> app_jar being Scala code compiled with Maven.
>>>
>>> java -cp $CLASSPATH -Dspark.jars=$APP_JAR -Dspark.master=$MASTER
>>> $CLASSNAME
>>> $ARGS
>>>
>>> Do you use this? or something else? I could never figure out this method.
>>> SPARK_HOME/bin/spark jar APP_JAR ARGS
>>>
>>> For example:
>>> bin/spark-class jar
>>>
>>> /usr/lib/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar
>>> pi 10 10
>>>
>>> Do you use SBT or Maven to compile? or something else?
>>>
>>>
>>> ** It seams that I can't get subscribed to the mailing list and I tried
>>> both
>>> my work email and personal.
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/How-do-you-run-your-spark-app-tp7935.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>
>>
>

Reply via email to