Alternative solution: https://github.com/xitrum-framework/xitrum-package
It collects all dependency .jar files in your Scala program into a directory. It doesn't merge the .jar files together, the .jar files are left "as is". On Sat, May 31, 2014 at 3:42 AM, Andrei <faithlessfri...@gmail.com> wrote: > Thanks, Stephen. I have eventually decided to go with assembly, but put away > Spark and Hadoop jars, and instead use `spark-submit` to automatically > provide these dependencies. This way no resource conflicts arise and > mergeStrategy needs no modification. To memorize this stable setup and also > share it with the community I've crafted a project [1] with minimal working > config. It is SBT project with assembly plugin, Spark 1.0 and Cloudera's > Hadoop client. Hope, it will help somebody to take Spark setup quicker. > > Though I'm fine with this setup for final builds, I'm still looking for a > more interactive dev setup - something that doesn't require full rebuild. > > [1]: https://github.com/faithlessfriend/sample-spark-project > > Thanks and have a good weekend, > Andrei > > On Thu, May 29, 2014 at 8:27 PM, Stephen Boesch <java...@gmail.com> wrote: >> >> >> The MergeStrategy combined with sbt assembly did work for me. This is not >> painless: some trial and error and the assembly may take multiple minutes. >> >> You will likely want to filter out some additional classes from the >> generated jar file. Here is an SOF answer to explain that and with IMHO the >> best answer snippet included here (in this case the OP understandably did >> not want to not include javax.servlet.Servlet) >> >> http://stackoverflow.com/questions/7819066/sbt-exclude-class-from-jar >> >> >> mappings in (Compile,packageBin) ~= { (ms: Seq[(File, String)]) => ms >> filter { case (file, toPath) => toPath != "javax/servlet/Servlet.class" } } >> >> There is a setting to not include the project files in the assembly but I >> do not recall it at this moment. >> >> >> >> 2014-05-29 10:13 GMT-07:00 Andrei <faithlessfri...@gmail.com>: >> >>> Thanks, Jordi, your gist looks pretty much like what I have in my project >>> currently (with few exceptions that I'm going to borrow). >>> >>> I like the idea of using "sbt package", since it doesn't require third >>> party plugins and, most important, doesn't create a mess of classes and >>> resources. But in this case I'll have to handle jar list manually via Spark >>> context. Is there a way to automate this process? E.g. when I was a Clojure >>> guy, I could run "lein deps" (lein is a build tool similar to sbt) to >>> download all dependencies and then just enumerate them from my app. Maybe >>> you have heard of something like that for Spark/SBT? >>> >>> Thanks, >>> Andrei >>> >>> >>> On Thu, May 29, 2014 at 3:48 PM, jaranda <jordi.ara...@bsc.es> wrote: >>>> >>>> Hi Andrei, >>>> >>>> I think the preferred way to deploy Spark jobs is by using the sbt >>>> package >>>> task instead of using the sbt assembly plugin. In any case, as you >>>> comment, >>>> the mergeStrategy in combination with some dependency exlusions should >>>> fix >>>> your problems. Have a look at this gist >>>> <https://gist.github.com/JordiAranda/bdbad58d128c14277a05> for further >>>> details (I just followed some recommendations commented in the sbt >>>> assembly >>>> plugin documentation). >>>> >>>> Up to now I haven't found a proper way to combine my >>>> development/deployment >>>> phases, although I must say my experience in Spark is pretty poor (it >>>> really >>>> depends in your deployment requirements as well). In this case, I think >>>> someone else could give you some further insights. >>>> >>>> Best, >>>> >>>> >>>> >>>> -- >>>> View this message in context: >>>> http://apache-spark-user-list.1001560.n3.nabble.com/Is-uberjar-a-recommended-way-of-running-Spark-Scala-applications-tp6518p6520.html >>>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>> >>> >> >