Re: “mapreduce.job.user.classpath.first” for Spark

Koert Kuipers Wed, 04 Feb 2015 12:54:16 -0800

i have never been a big fan of shading, since it leads to the same library
being packaged many times. for example, its not unusual to have ASM 10
times in a jar because of the shading policy they promote. and all that
because they broke one signature and without a really good reason.


i am a big fan of maintaining backwards compatibility and when you dont
change the maven artifact and java package, like jackson did when they went
from 1 to 2. now that makes sense :) you can even have jackson 1 and 2
together within the same project without any issues.



On Wed, Feb 4, 2015 at 3:44 PM, Corey Nolet <cjno...@gmail.com> wrote:

> > leading to a real classloader hell if you tried to add a newer version
> of jar that spark already used.
>
> Spark at least makes this easier on the Guava side [1] by shading the
> package names internally so that there's no possibility of a conflict.
> Elasticsearch & Storm do this too for many of their dependencies and I
> think it's a great practice for libraries that are only used internally-
> specifically when those internal libraries are not exposed at all to the
> outside. If you are only using said libraries internally, that strategy may
> work for you as well, Koert. I'm going to ask about this on the Hadoop list
> as well to see if maybe there was a decision against it for reasons I
> haven't thought of.
>
> > Another suggestion is to build Spark by yourself.
>
> I'm having trouble seeing what you mean here, Marcelo. Guava is already
> shaded to a different package for the 1.2.0 release. It shouldn't be
> causing conflicts.
>
> [1] https://issues.apache.org/jira/browse/SPARK-2848
>
> On Wed, Feb 4, 2015 at 2:35 PM, Koert Kuipers <ko...@tresata.com> wrote:
>
>> the whole spark.files.userClassPathFirs never really worked for me in
>> standalone mode, since jars were added dynamically which means they had
>> different classloaders leading to a real classloader hell if you tried to
>> add a newer version of jar that spark already used. see:
>> https://issues.apache.org/jira/browse/SPARK-1863
>>
>> do i understand it correctly that on yarn the the customer jars are truly
>> placed before the yarn and spark jars on classpath? meaning at container
>> construction time, on the same classloader? that would be great news for
>> me. it would open up the possibility of using newer versions of many
>> libraries.
>>
>>
>> On Wed, Feb 4, 2015 at 1:12 PM, Marcelo Vanzin <van...@cloudera.com>
>> wrote:
>>
>>> Hi Corey,
>>>
>>> When you run on Yarn, Yarn's libraries are placed in the classpath,
>>> and they have precedence over your app's. So, with Spark 1.2, you'll
>>> get Guava 11 in your classpath (with Spark 1.1 and earlier you'd get
>>> Guava 14 from Spark, so still a problem for you).
>>>
>>> Right now, the option Markus mentioned
>>> (spark.yarn.user.classpath.first) can be a workaround for you, since
>>> it will place your app's jars before Yarn's on the classpath.
>>>
>>>
>>> On Tue, Feb 3, 2015 at 8:20 PM, Corey Nolet <cjno...@gmail.com> wrote:
>>> > I'm having a really bad dependency conflict right now with Guava
>>> versions
>>> > between my Spark application in Yarn and (I believe) Hadoop's version.
>>> >
>>> > The problem is, my driver has the version of Guava which my
>>> application is
>>> > expecting (15.0) while it appears the Spark executors that are working
>>> on my
>>> > RDDs have a much older version (assuming it's the old version on the
>>> Hadoop
>>> > classpath).
>>> >
>>> > Is there a property like "mapreduce.job.user.classpath.first' that I
>>> can set
>>> > to make sure my own classpath is extablished first on the executors?
>>>
>>>
>>>
>>> --
>>> Marcelo
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>>
>

Re: “mapreduce.job.user.classpath.first” for Spark

Reply via email to