yeah i think we have been lucky so far. but i dont really see how i have a choice. it would be fine if say hadoop exposes a very small set of libraries as part of the classpath. but if i look at the jars on hadoop classpath its a ton! and why? why does parquet need to be included with hadoop for example? or avro? it just makes my life harder. and i dont really see who benefits.
the yarn classpath is insane too. On Wed, Feb 4, 2015 at 4:26 PM, Marcelo Vanzin <van...@cloudera.com> wrote: > On Wed, Feb 4, 2015 at 1:12 PM, Koert Kuipers <ko...@tresata.com> wrote: > > about putting stuff on classpath before spark or yarn... yeah you can > shoot > > yourself in the foot with it, but since the container is isolated it > should > > be ok, no? we have been using HADOOP_USER_CLASSPATH_FIRST forever with > great > > success. > > The container still has to use Hadoop libraries, e.g., to talk to > HDFS. If you override a library it needs with an incompatible one, you > may break something. So maybe you've just been lucky. :-) > > In reality it should be pretty hard to cause breakages if you're > careful - e.g., when you override a jar of some library that generally > has multiple jars, such as Jackson, you need to include all of them, > not just the one(s) you need in your app. > > MR also has an option similar to Spark's userClassPath (see > https://issues.apache.org/jira/browse/MAPREDUCE-1700), which doesn't > involve messing with the system's class path. > > -- > Marcelo >