anyhow i am ranting... sorry On Wed, Feb 4, 2015 at 5:54 PM, Koert Kuipers <ko...@tresata.com> wrote:
> yeah i think we have been lucky so far. but i dont really see how i have a > choice. it would be fine if say hadoop exposes a very small set of > libraries as part of the classpath. but if i look at the jars on hadoop > classpath its a ton! and why? why does parquet need to be included with > hadoop for example? or avro? it just makes my life harder. and i dont > really see who benefits. > > the yarn classpath is insane too. > > On Wed, Feb 4, 2015 at 4:26 PM, Marcelo Vanzin <van...@cloudera.com> > wrote: > >> On Wed, Feb 4, 2015 at 1:12 PM, Koert Kuipers <ko...@tresata.com> wrote: >> > about putting stuff on classpath before spark or yarn... yeah you can >> shoot >> > yourself in the foot with it, but since the container is isolated it >> should >> > be ok, no? we have been using HADOOP_USER_CLASSPATH_FIRST forever with >> great >> > success. >> >> The container still has to use Hadoop libraries, e.g., to talk to >> HDFS. If you override a library it needs with an incompatible one, you >> may break something. So maybe you've just been lucky. :-) >> >> In reality it should be pretty hard to cause breakages if you're >> careful - e.g., when you override a jar of some library that generally >> has multiple jars, such as Jackson, you need to include all of them, >> not just the one(s) you need in your app. >> >> MR also has an option similar to Spark's userClassPath (see >> https://issues.apache.org/jira/browse/MAPREDUCE-1700), which doesn't >> involve messing with the system's class path. >> >> -- >> Marcelo >> > >