Re: How to distribute dependent files (.so , jar ) across spark worker nodes

David Gomez Saavedra Tue, 15 Mar 2016 02:12:07 -0700

If you are using sbt, I personally use sbt-pack to pack all dependencies
under a certain folder and then I set those jars in the spark config


// just for demo I load this through config file overridden by environment
variables
val sparkJars = Seq
("/ROOT_OF_YOUR_PROJECT/target/pack/lib/YOUR_JAR_DEPENDENCY.jar","/ROOT_OF_YOUR_PROJECT/target/pack/lib/YOUR_JAR_DEPENDENCY.jar")

val conf = new SparkConf()
  .setMaster(sparkMaster)
  .setAppName(sparkApp)
   .....
  .setJars(sparkJars)

then run it with sbt pack run


On Mon, Mar 14, 2016 at 11:58 PM, Tristan Nixon <st...@memeticlabs.org>
wrote:

> I see - so you want the dependencies pre-installed on the cluster nodes so
> they do not need to be submitted along with the job jar?
>
> Where are you planning on deploying/running spark? Do you have your own
> cluster or are you using AWS/other IaaS/PaaS provider?
>
> Somehow you’ll need to get the dependencies onto each node and add them to
> Spark’s classpaths. You could modify an existing VM image or use chef to
> distribute the jars and update the class-paths.
>
> On Mar 14, 2016, at 5:26 PM, prateek arora <prateek.arora...@gmail.com>
> wrote:
>
> Hi
>
> I do not want create single jar that contains all the other dependencies .
>  because it will increase the size of my spark job jar .
> so i want to copy all libraries in cluster using some automation process .
> just like currently i am using chef .
> but i am not sure is it a right method or not ?
>
>
> Regards
> Prateek
>
>
> On Mon, Mar 14, 2016 at 2:31 PM, Jakob Odersky <ja...@odersky.com> wrote:
>
>> Have you tried setting the configuration
>> `spark.executor.extraLibraryPath` to point to a location where your
>> .so's are available? (Not sure if non-local files, such as HDFS, are
>> supported)
>>
>> On Mon, Mar 14, 2016 at 2:12 PM, Tristan Nixon <st...@memeticlabs.org>
>> wrote:
>> > What build system are you using to compile your code?
>> > If you use a dependency management system like maven or sbt, then you
>> should be able to instruct it to build a single jar that contains all the
>> other dependencies, including third-party jars and .so’s. I am a maven user
>> myself, and I use the shade plugin for this:
>> > https://maven.apache.org/plugins/maven-shade-plugin/
>> >
>> > However, if you are using SBT or another dependency manager, someone
>> else on this list may be able to give you help on that.
>> >
>> > If you’re not using a dependency manager - well, you should be. Trying
>> to manage this manually is a pain that you do not want to get in the way of
>> your project. There are perfectly good tools to do this for you; use them.
>> >
>> >> On Mar 14, 2016, at 3:56 PM, prateek arora <prateek.arora...@gmail.com>
>> wrote:
>> >>
>> >> Hi
>> >>
>> >> Thanks for the information .
>> >>
>> >> but my problem is that if i want to write spark application which
>> depend on
>> >> third party libraries like opencv then whats is the best approach to
>> >> distribute all .so and jar file of opencv in all cluster ?
>> >>
>> >> Regards
>> >> Prateek
>> >>
>> >>
>> >>
>> >> --
>> >> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-distribute-dependent-files-so-jar-across-spark-worker-nodes-tp26464p26489.html
>> >> Sent from the Apache Spark User List mailing list archive at
>> Nabble.com <http://nabble.com>.
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> >> For additional commands, e-mail: user-h...@spark.apache.org
>> >>
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> > For additional commands, e-mail: user-h...@spark.apache.org
>> >
>>
>
>
>

Re: How to distribute dependent files (.so , jar ) across spark worker nodes

Reply via email to