yeah i think we have been lucky so far. but i dont really see how i have a
choice. it would be fine if say hadoop exposes a very small set of
libraries as part of the classpath. but if i look at the jars on hadoop
classpath its a ton! and why? why does parquet need to be included with
hadoop for example? or avro? it just makes my life harder. and i dont
really see who benefits.

the yarn classpath is insane too.

On Wed, Feb 4, 2015 at 4:26 PM, Marcelo Vanzin <van...@cloudera.com> wrote:

> On Wed, Feb 4, 2015 at 1:12 PM, Koert Kuipers <ko...@tresata.com> wrote:
> > about putting stuff on classpath before spark or yarn... yeah you can
> shoot
> > yourself in the foot with it, but since the container is isolated it
> should
> > be ok, no? we have been using HADOOP_USER_CLASSPATH_FIRST forever with
> great
> > success.
>
> The container still has to use Hadoop libraries, e.g., to talk to
> HDFS. If you override a library it needs with an incompatible one, you
> may break something. So maybe you've just been lucky. :-)
>
> In reality it should be pretty hard to cause breakages if you're
> careful - e.g., when you override a jar of some library that generally
> has multiple jars, such as Jackson, you need to include all of them,
> not just the one(s) you need in your app.
>
> MR also has an option similar to Spark's userClassPath (see
> https://issues.apache.org/jira/browse/MAPREDUCE-1700), which doesn't
> involve messing with the system's class path.
>
> --
> Marcelo
>

Reply via email to