Marcelo, Specifying the driver-class-path yields behavior like https://issues.apache.org/jira/browse/SPARK-2420 and https://issues.apache.org/jira/browse/SPARK-2848 It feels like opening a can of worms here if I also need to replace the guava dependencies.
Wouldn’t calling “./make-distribution.sh —skip-java-test —hadoop 2.4.1 —with-yarn —tgz” include the appropriate versions of the hadoop libs into the spark jar? I’m trying to rebuild using the hadoop-provided profile, but I’m getting several build errors. Is this sufficient: mvm -Phadoop-provided clean package -Phadoop-2.4 -Pyarn -Dyarn.version=2.4.1 -Dhadoop.version=2.4.1 -DskipTests Or am I missing something completely? What is the time frame to have the above JIRA issues resolved? mn On Aug 20, 2014, at 11:25 AM, Marcelo Vanzin <van...@cloudera.com> wrote: > Ah, sorry, forgot to talk about the second issue. > > On Wed, Aug 20, 2014 at 8:54 AM, Matt Narrell <matt.narr...@gmail.com> wrote: >> However, now the Spark jobs running in the ApplicationMaster on a given node >> fails to find the active resourcemanager. Below is a log excerpt from one >> of the assigned nodes. As all the jobs fail, eventually YARN will move this >> to execute on the node that co-locates the active resourcemanager and a >> nodemanager, where the job will proceed a bit further. Then, the Spark job >> itself will fail attempting to access HDFS files via the virtualized HA HDFS >> URI. > >> 14/08/20 11:34:27 INFO Client: Retrying connect to server: >> 0.0.0.0/0.0.0.0:8030. Already tried 0 time(s); retry policy is >> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 >> MILLISECONDS) > > Here it seems you're running into the same issue but on the AM side. > If you're comfortable with using a custom build of Spark, I'd > recommend building it with the "-Phadoop-provided" profile enabled > (note: I think that only works with maven currently). That way the > Spark assembly does not include the Hadoop classes. > > A quick look at the code seems to indicate the Spark assembly is > always added before the local Hadoop jars in the classpath. > Unfortunately there's no workaround for that at the moment (aside from > the above built-time fix), although we might be able to do something > when SPARK-2848 is fixed. > > -- > Marcelo