Re: Is uberjar a recommended way of running Spark/Scala applications?

Ngoc Dao Sun, 01 Jun 2014 19:13:23 -0700

Alternative solution:
https://github.com/xitrum-framework/xitrum-package


It collects all dependency .jar files in your Scala program into a
directory. It doesn't merge the .jar files together, the .jar files
are left "as is".


On Sat, May 31, 2014 at 3:42 AM, Andrei <faithlessfri...@gmail.com> wrote:
> Thanks, Stephen. I have eventually decided to go with assembly, but put away
> Spark and Hadoop jars, and instead use `spark-submit` to automatically
> provide these dependencies. This way no resource conflicts arise and
> mergeStrategy needs no modification. To memorize this stable setup and also
> share it with the community I've crafted a project [1] with minimal working
> config. It is SBT project with assembly plugin, Spark 1.0 and Cloudera's
> Hadoop client. Hope, it will help somebody to take Spark setup quicker.
>
> Though I'm fine with this setup for final builds, I'm still looking for a
> more interactive dev setup - something that doesn't require full rebuild.
>
> [1]: https://github.com/faithlessfriend/sample-spark-project
>
> Thanks and have a good weekend,
> Andrei
>
> On Thu, May 29, 2014 at 8:27 PM, Stephen Boesch <java...@gmail.com> wrote:
>>
>>
>> The MergeStrategy combined with sbt assembly did work for me.  This is not
>> painless: some trial and error and the assembly may take multiple minutes.
>>
>> You will likely want to filter out some additional classes from the
>> generated jar file.  Here is an SOF answer to explain that and with IMHO the
>> best answer snippet included here (in this case the OP understandably did
>> not want to not include javax.servlet.Servlet)
>>
>> http://stackoverflow.com/questions/7819066/sbt-exclude-class-from-jar
>>
>>
>> mappings in (Compile,packageBin) ~= { (ms: Seq[(File, String)]) => ms
>> filter { case (file, toPath) => toPath != "javax/servlet/Servlet.class" } }
>>
>> There is a setting to not include the project files in the assembly but I
>> do not recall it at this moment.
>>
>>
>>
>> 2014-05-29 10:13 GMT-07:00 Andrei <faithlessfri...@gmail.com>:
>>
>>> Thanks, Jordi, your gist looks pretty much like what I have in my project
>>> currently (with few exceptions that I'm going to borrow).
>>>
>>> I like the idea of using "sbt package", since it doesn't require third
>>> party plugins and, most important, doesn't create a mess of classes and
>>> resources. But in this case I'll have to handle jar list manually via Spark
>>> context. Is there a way to automate this process? E.g. when I was a Clojure
>>> guy, I could run "lein deps" (lein is a build tool similar to sbt) to
>>> download all dependencies and then just enumerate them from my app. Maybe
>>> you have heard of something like that for Spark/SBT?
>>>
>>> Thanks,
>>> Andrei
>>>
>>>
>>> On Thu, May 29, 2014 at 3:48 PM, jaranda <jordi.ara...@bsc.es> wrote:
>>>>
>>>> Hi Andrei,
>>>>
>>>> I think the preferred way to deploy Spark jobs is by using the sbt
>>>> package
>>>> task instead of using the sbt assembly plugin. In any case, as you
>>>> comment,
>>>> the mergeStrategy in combination with some dependency exlusions should
>>>> fix
>>>> your problems. Have a look at  this gist
>>>> <https://gist.github.com/JordiAranda/bdbad58d128c14277a05>   for further
>>>> details (I just followed some recommendations commented in the sbt
>>>> assembly
>>>> plugin documentation).
>>>>
>>>> Up to now I haven't found a proper way to combine my
>>>> development/deployment
>>>> phases, although I must say my experience in Spark is pretty poor (it
>>>> really
>>>> depends in your deployment requirements as well). In this case, I think
>>>> someone else could give you some further insights.
>>>>
>>>> Best,
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Is-uberjar-a-recommended-way-of-running-Spark-Scala-applications-tp6518p6520.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>>
>>
>

Re: Is uberjar a recommended way of running Spark/Scala applications?

Reply via email to