i have never been a big fan of shading, since it leads to the same library being packaged many times. for example, its not unusual to have ASM 10 times in a jar because of the shading policy they promote. and all that because they broke one signature and without a really good reason.
i am a big fan of maintaining backwards compatibility and when you dont change the maven artifact and java package, like jackson did when they went from 1 to 2. now that makes sense :) you can even have jackson 1 and 2 together within the same project without any issues. On Wed, Feb 4, 2015 at 3:44 PM, Corey Nolet <cjno...@gmail.com> wrote: > > leading to a real classloader hell if you tried to add a newer version > of jar that spark already used. > > Spark at least makes this easier on the Guava side [1] by shading the > package names internally so that there's no possibility of a conflict. > Elasticsearch & Storm do this too for many of their dependencies and I > think it's a great practice for libraries that are only used internally- > specifically when those internal libraries are not exposed at all to the > outside. If you are only using said libraries internally, that strategy may > work for you as well, Koert. I'm going to ask about this on the Hadoop list > as well to see if maybe there was a decision against it for reasons I > haven't thought of. > > > Another suggestion is to build Spark by yourself. > > I'm having trouble seeing what you mean here, Marcelo. Guava is already > shaded to a different package for the 1.2.0 release. It shouldn't be > causing conflicts. > > [1] https://issues.apache.org/jira/browse/SPARK-2848 > > On Wed, Feb 4, 2015 at 2:35 PM, Koert Kuipers <ko...@tresata.com> wrote: > >> the whole spark.files.userClassPathFirs never really worked for me in >> standalone mode, since jars were added dynamically which means they had >> different classloaders leading to a real classloader hell if you tried to >> add a newer version of jar that spark already used. see: >> https://issues.apache.org/jira/browse/SPARK-1863 >> >> do i understand it correctly that on yarn the the customer jars are truly >> placed before the yarn and spark jars on classpath? meaning at container >> construction time, on the same classloader? that would be great news for >> me. it would open up the possibility of using newer versions of many >> libraries. >> >> >> On Wed, Feb 4, 2015 at 1:12 PM, Marcelo Vanzin <van...@cloudera.com> >> wrote: >> >>> Hi Corey, >>> >>> When you run on Yarn, Yarn's libraries are placed in the classpath, >>> and they have precedence over your app's. So, with Spark 1.2, you'll >>> get Guava 11 in your classpath (with Spark 1.1 and earlier you'd get >>> Guava 14 from Spark, so still a problem for you). >>> >>> Right now, the option Markus mentioned >>> (spark.yarn.user.classpath.first) can be a workaround for you, since >>> it will place your app's jars before Yarn's on the classpath. >>> >>> >>> On Tue, Feb 3, 2015 at 8:20 PM, Corey Nolet <cjno...@gmail.com> wrote: >>> > I'm having a really bad dependency conflict right now with Guava >>> versions >>> > between my Spark application in Yarn and (I believe) Hadoop's version. >>> > >>> > The problem is, my driver has the version of Guava which my >>> application is >>> > expecting (15.0) while it appears the Spark executors that are working >>> on my >>> > RDDs have a much older version (assuming it's the old version on the >>> Hadoop >>> > classpath). >>> > >>> > Is there a property like "mapreduce.job.user.classpath.first' that I >>> can set >>> > to make sure my own classpath is extablished first on the executors? >>> >>> >>> >>> -- >>> Marcelo >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> For additional commands, e-mail: user-h...@spark.apache.org >>> >>> >> >