Hi Shivani,
I use sbt assembly to create a fat jar .
https://github.com/sbt/sbt-assembly
Example of the sbt file is below.
import AssemblyKeys._ // put this at the top of the file
assemblySettings
mainClass in assembly := Some("FifaSparkStreaming")
name := "FifaSparkStreaming"
version := "1.0"
scalaVersion := "2.10.4"
libraryDependencies ++= Seq("org.apache.spark" %% "spark-core" % "1.0.0" %
"provided",
"org.apache.spark" %% "spark-streaming" %
"1.0.0" % "provided",
("org.apache.spark" %%
"spark-streaming-twitter" %
"1.0.0").exclude("org.eclipse.jetty.orbit","javax.transaction")
.exclude("org.eclipse.jetty.orbit","javax.servlet")
.exclude("org.eclipse.jetty.orbit","javax.mail.glassfish")
.exclude("org.eclipse.jetty.orbit","javax.activation")
.exclude("com.esotericsoftware.minlog", "minlog"),
("net.debasishg" % "redisclient_2.10" %
"2.12").exclude("com.typesafe.akka","akka-actor_2.10"))
mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
{
case PathList("javax", "servlet", xs @ _*) => MergeStrategy.first
case PathList("org", "apache", xs @ _*) => MergeStrategy.first
case PathList("org", "apache", xs @ _*) => MergeStrategy.first
case "application.conf" => MergeStrategy.concat
case "unwanted.txt" => MergeStrategy.discard
case x => old(x)
}
}
resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
And I run as mentioned below.
LOCALLY :
1) sbt 'run AP1z4IYraYm5fqWhITWArY53x
Cyyz3Zr67tVK46G8dus5tSbc83KQOdtMDgYoQ5WLQwH0mTWzB6
115254720-OfJ4yFsUU6C6vBkEOMDlBlkIgslPleFjPwNcxHjN
Qd76y2izncM7fGGYqU1VXYTxg1eseNuzcdZKm2QJyK8d1 fifa fifa2014'
If you want to submit on the cluster
CLUSTER:
2) spark-submit --class FifaSparkStreaming --master
"spark://server-8-144:7077" --driver-memory 2048 --deploy-mode cluster
FifaSparkStreaming-assembly-1.0.jar AP1z4IYraYm5fqWhITWArY53x
Cyyz3Zr67tVK46G8dus5tSbc83KQOdtMDgYoQ5WLQwH0mTWzB6
115254720-OfJ4yFsUU6C6vBkEOMDlBlkIgslPleFjPwNcxHjN
Qd76y2izncM7fGGYqU1VXYTxg1eseNuzcdZKm2QJyK8d1 fifa fifa2014
Hope this helps.
Thanks,
Shrikar
On Fri, Jun 20, 2014 at 9:16 AM, Shivani Rao <[email protected]> wrote:
> Hello Michael,
>
> I have a quick question for you. Can you clarify the statement " build
> fat JAR's and build dist-style TAR.GZ packages with launch scripts, JAR's
> and everything needed to run a Job". Can you give an example.
>
> I am using sbt assembly as well to create a fat jar, and supplying the
> spark and hadoop locations in the class path. Inside the main() function
> where spark context is created, I use SparkContext.jarOfClass(this).toList
> add the fat jar to my spark context. However, I seem to be running into
> issues with this approach. I was wondering if you had any inputs Michael.
>
> Thanks,
> Shivani
>
>
> On Thu, Jun 19, 2014 at 10:57 PM, Sonal Goyal <[email protected]>
> wrote:
>
>> We use maven for building our code and then invoke spark-submit through
>> the exec plugin, passing in our parameters. Works well for us.
>>
>> Best Regards,
>> Sonal
>> Nube Technologies <http://www.nubetech.co>
>>
>> <http://in.linkedin.com/in/sonalgoyal>
>>
>>
>>
>>
>> On Fri, Jun 20, 2014 at 3:26 AM, Michael Cutler <[email protected]>
>> wrote:
>>
>>> P.S. Last but not least we use sbt-assembly to build fat JAR's and build
>>> dist-style TAR.GZ packages with launch scripts, JAR's and everything needed
>>> to run a Job. These are automatically built from source by our Jenkins and
>>> stored in HDFS. Our Chronos/Marathon jobs fetch the latest release TAR.GZ
>>> direct from HDFS, unpack it and launch the appropriate script.
>>>
>>> Makes for a much cleaner development / testing / deployment to package
>>> everything required in one go instead of relying on cluster specific
>>> classpath additions or any add-jars functionality.
>>>
>>>
>>> On 19 June 2014 22:53, Michael Cutler <[email protected]> wrote:
>>>
>>>> When you start seriously using Spark in production there are basically
>>>> two things everyone eventually needs:
>>>>
>>>> 1. Scheduled Jobs - recurring hourly/daily/weekly jobs.
>>>> 2. Always-On Jobs - that require monitoring, restarting etc.
>>>>
>>>> There are lots of ways to implement these requirements, everything from
>>>> crontab through to workflow managers like Oozie.
>>>>
>>>> We opted for the following stack:
>>>>
>>>> - Apache Mesos <http://mesosphere.io/> (mesosphere.io distribution)
>>>>
>>>>
>>>> - Marathon <https://github.com/mesosphere/marathon> - init/control
>>>> system for starting, stopping, and maintaining always-on applications.
>>>>
>>>>
>>>> - Chronos <http://airbnb.github.io/chronos/> - general-purpose
>>>> scheduler for Mesos, supports job dependency graphs.
>>>>
>>>>
>>>> - ** Spark Job Server <https://github.com/ooyala/spark-jobserver> -
>>>> primarily for it's ability to reuse shared contexts with multiple jobs
>>>>
>>>> The majority of our jobs are periodic (batch) jobs run through
>>>> spark-sumit, and we have several always-on Spark Streaming jobs (also run
>>>> through spark-submit).
>>>>
>>>> We always use "client mode" with spark-submit because the Mesos cluster
>>>> has direct connectivity to the Spark cluster and it means all the Spark
>>>> stdout/stderr is externalised into Mesos logs which helps diagnosing
>>>> problems.
>>>>
>>>> I thoroughly recommend you explore using Mesos/Marathon/Chronos to run
>>>> Spark and manage your Jobs, the Mesosphere tutorials are awesome and you
>>>> can be up and running in literally minutes. The Web UI's to both make it
>>>> easy to get started without talking to REST API's etc.
>>>>
>>>> Best,
>>>>
>>>> Michael
>>>>
>>>>
>>>>
>>>>
>>>> On 19 June 2014 19:44, Evan R. Sparks <[email protected]> wrote:
>>>>
>>>>> I use SBT, create an assembly, and then add the assembly jars when I
>>>>> create my spark context. The main executor I run with something like "java
>>>>> -cp ... MyDriver".
>>>>>
>>>>> That said - as of spark 1.0 the preferred way to run spark
>>>>> applications is via spark-submit -
>>>>> http://spark.apache.org/docs/latest/submitting-applications.html
>>>>>
>>>>>
>>>>> On Thu, Jun 19, 2014 at 11:36 AM, ldmtwo <[email protected]> wrote:
>>>>>
>>>>>> I want to ask this, not because I can't read endless documentation and
>>>>>> several tutorials, but because there seems to be many ways of doing
>>>>>> things
>>>>>> and I keep having issues. How do you run /your /spark app?
>>>>>>
>>>>>> I had it working when I was only using yarn+hadoop1 (Cloudera), then
>>>>>> I had
>>>>>> to get Spark and Shark working and ended upgrading everything and
>>>>>> dropped
>>>>>> CDH support. Anyways, this is what I used with master=yarn-client and
>>>>>> app_jar being Scala code compiled with Maven.
>>>>>>
>>>>>> java -cp $CLASSPATH -Dspark.jars=$APP_JAR -Dspark.master=$MASTER
>>>>>> $CLASSNAME
>>>>>> $ARGS
>>>>>>
>>>>>> Do you use this? or something else? I could never figure out this
>>>>>> method.
>>>>>> SPARK_HOME/bin/spark jar APP_JAR ARGS
>>>>>>
>>>>>> For example:
>>>>>> bin/spark-class jar
>>>>>>
>>>>>> /usr/lib/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar
>>>>>> pi 10 10
>>>>>>
>>>>>> Do you use SBT or Maven to compile? or something else?
>>>>>>
>>>>>>
>>>>>> ** It seams that I can't get subscribed to the mailing list and I
>>>>>> tried both
>>>>>> my work email and personal.
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/How-do-you-run-your-spark-app-tp7935.html
>>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>>> Nabble.com.
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
>
>