Indeed you don't need it, just make sure that it is in your classpath. But anyway the jar is not so big, I mean compared to what next your job will do, sending some mo over the network seems OK to me.
2014/1/5 Aureliano Buendia <buendia...@gmail.com> > Eugen, I noticed that you are including hadoop in your fat jar: > > <include>org.apache.hadoop:*</include> > > This would take a big chunk of the fat jar. Isn't this jar already > included in spark? > > > On Thu, Jan 2, 2014 at 11:38 AM, Eugen Cepoi <cepoi.eu...@gmail.com>wrote: > >> It depends how you deploy, I don't find it so complicated... >> >> 1) To build the fat jar I am using maven (as I am not familiar with sbt). >> >> Inside I have something like that, saying which libs should be used in >> the fat jar (the others won't be present in the final artifact). >> >> <plugin> >> <groupId>org.apache.maven.plugins</groupId> >> <artifactId>maven-shade-plugin</artifactId> >> <version>2.1</version> >> <executions> >> <execution> >> <phase>package</phase> >> <goals> >> <goal>shade</goal> >> </goals> >> <configuration> >> <minimizeJar>true</minimizeJar> >> >> <createDependencyReducedPom>false</createDependencyReducedPom> >> <artifactSet> >> <includes> >> <include>org.apache.hbase:*</include> >> <include>org.apache.hadoop:*</include> >> <include>com.typesafe:config</include> >> <include>org.apache.avro:*</include> >> <include>joda-time:*</include> >> <include>org.joda:*</include> >> </includes> >> </artifactSet> >> <filters> >> <filter> >> <artifact>*:*</artifact> >> <excludes> >> <exclude>META-INF/*.SF</exclude> >> <exclude>META-INF/*.DSA</exclude> >> <exclude>META-INF/*.RSA</exclude> >> </excludes> >> </filter> >> </filters> >> </configuration> >> </execution> >> </executions> >> </plugin> >> >> >> 2) The App is the jar you have built, so you ship it to the driver node >> (it depends a lot on how you are planing to use it, debian packaging, a >> plain old scp, etc) to run it you can do something like: >> >> $SPARK_HOME/spark-class SPARK_CLASSPATH=PathToYour.jar com.myproject.MyJob >> >> where MyJob is the entry point to your job it defines a main method. >> >> 3) I don't know whats the "common way" but I am doing things this way: >> build the fat jar, provide some launch scripts, make debian packaging, ship >> it to a node that plays the role of the driver, run it over mesos using the >> launch scripts + some conf. >> >> >> 2014/1/2 Aureliano Buendia <buendia...@gmail.com> >> >>> I wasn't aware of jarOfClass. I wish there was only one good way of >>> deploying in spark, instead of many ambiguous methods. (seems like spark >>> has followed scala in that there are more than one way of accomplishing a >>> job, making scala an overcomplicated language) >>> >>> 1. Should sbt assembly be used to make the fat jar? If so, which sbt >>> should be used? My local sbt or that $SPARK_HOME/sbt/sbt? Why is that spark >>> is shipped with a separate sbt? >>> >>> 2. Let's say we have the dependencies fat jar which is supposed to be >>> shipped to the workers. Now how do we deploy the main app which is supposed >>> to be executed on the driver? Make jar another jar out of it? Does sbt >>> assembly also create that jar? >>> >>> 3. Is calling sc.jarOfClass() the most common way of doing this? I >>> cannot find any example by googling. What's the most common way that people >>> use? >>> >>> >>> >>> On Thu, Jan 2, 2014 at 10:58 AM, Eugen Cepoi <cepoi.eu...@gmail.com>wrote: >>> >>>> Hi, >>>> >>>> This is the list of the jars you use in your job, the driver will send >>>> all those jars to each worker (otherwise the workers won't have the classes >>>> you need in your job). The easy way to go is to build a fat jar with your >>>> code and all the libs you depend on and then use this utility to get the >>>> path: SparkContext.jarOfClass(YourJob.getClass) >>>> >>>> >>>> 2014/1/2 Aureliano Buendia <buendia...@gmail.com> >>>> >>>>> Hi, >>>>> >>>>> I do not understand why spark context has an option for loading jars >>>>> at runtime. >>>>> >>>>> As an example, consider >>>>> this<https://github.com/apache/incubator-spark/blob/50fd8d98c00f7db6aa34183705c9269098c62486/examples/src/main/scala/org/apache/spark/examples/BroadcastTest.scala#L36> >>>>> : >>>>> >>>>> object BroadcastTest { >>>>> def main(args: Array[String]) { >>>>> >>>>> >>>>> >>>>> >>>>> val sc = new SparkContext(args(0), "Broadcast Test", >>>>> >>>>> >>>>> >>>>> >>>>> System.getenv("SPARK_HOME"), >>>>> Seq(System.getenv("SPARK_EXAMPLES_JAR"))) >>>>> >>>>> >>>>> >>>>> >>>>> } >>>>> } >>>>> >>>>> >>>>> This is *the* example, or *the* application that we want to run, what >>>>> does SPARK_EXAMPLES_JAR supposed to be? >>>>> In this particular case, the BroadcastTest example is self-contained, why >>>>> would it want to load other unrelated example jars? >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Finally, how does this help a real world spark application? >>>>> >>>>> >>>> >>> >> >