anyhow i am ranting... sorry

On Wed, Feb 4, 2015 at 5:54 PM, Koert Kuipers <ko...@tresata.com> wrote:

> yeah i think we have been lucky so far. but i dont really see how i have a
> choice. it would be fine if say hadoop exposes a very small set of
> libraries as part of the classpath. but if i look at the jars on hadoop
> classpath its a ton! and why? why does parquet need to be included with
> hadoop for example? or avro? it just makes my life harder. and i dont
> really see who benefits.
>
> the yarn classpath is insane too.
>
> On Wed, Feb 4, 2015 at 4:26 PM, Marcelo Vanzin <van...@cloudera.com>
> wrote:
>
>> On Wed, Feb 4, 2015 at 1:12 PM, Koert Kuipers <ko...@tresata.com> wrote:
>> > about putting stuff on classpath before spark or yarn... yeah you can
>> shoot
>> > yourself in the foot with it, but since the container is isolated it
>> should
>> > be ok, no? we have been using HADOOP_USER_CLASSPATH_FIRST forever with
>> great
>> > success.
>>
>> The container still has to use Hadoop libraries, e.g., to talk to
>> HDFS. If you override a library it needs with an incompatible one, you
>> may break something. So maybe you've just been lucky. :-)
>>
>> In reality it should be pretty hard to cause breakages if you're
>> careful - e.g., when you override a jar of some library that generally
>> has multiple jars, such as Jackson, you need to include all of them,
>> not just the one(s) you need in your app.
>>
>> MR also has an option similar to Spark's userClassPath (see
>> https://issues.apache.org/jira/browse/MAPREDUCE-1700), which doesn't
>> involve messing with the system's class path.
>>
>> --
>> Marcelo
>>
>
>

Reply via email to