Re: spark-submit with HA YARN

Matt Narrell Wed, 20 Aug 2014 12:04:59 -0700

Marcelo,

Specifying the driver-class-path yields behavior like 
https://issues.apache.org/jira/browse/SPARK-2420 and 
https://issues.apache.org/jira/browse/SPARK-2848  It feels like opening a can 
of worms here if I also need to replace the guava dependencies.


Wouldn’t calling “./make-distribution.sh —skip-java-test —hadoop 2.4.1 
—with-yarn —tgz” include the appropriate versions of the hadoop libs into the 
spark jar?  

I’m trying to rebuild using the hadoop-provided profile, but I’m getting 
several build errors.  Is this sufficient:  mvm -Phadoop-provided clean package 
-Phadoop-2.4 -Pyarn -Dyarn.version=2.4.1 -Dhadoop.version=2.4.1 -DskipTests

Or am I missing something completely?  What is the time frame to have the above 
JIRA issues resolved?

mn

On Aug 20, 2014, at 11:25 AM, Marcelo Vanzin <van...@cloudera.com> wrote:

> Ah, sorry, forgot to talk about the second issue.
> 
> On Wed, Aug 20, 2014 at 8:54 AM, Matt Narrell <matt.narr...@gmail.com> wrote:
>> However, now the Spark jobs running in the ApplicationMaster on a given node
>> fails to find the active resourcemanager.  Below is a log excerpt from one
>> of the assigned nodes.  As all the jobs fail, eventually YARN will move this
>> to execute on the node that co-locates the active resourcemanager and a
>> nodemanager, where the job will proceed a bit further.  Then, the Spark job
>> itself will fail attempting to access HDFS files via the virtualized HA HDFS
>> URI.
> 
>> 14/08/20 11:34:27 INFO Client: Retrying connect to server:
>> 0.0.0.0/0.0.0.0:8030. Already tried 0 time(s); retry policy is
>> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000
>> MILLISECONDS)
> 
> Here it seems you're running into the same issue but on the AM side.
> If you're comfortable with using a custom build of Spark, I'd
> recommend building it with the "-Phadoop-provided" profile enabled
> (note: I think that only works with maven currently). That way the
> Spark assembly does not include the Hadoop classes.
> 
> A quick look at the code seems to indicate the Spark assembly is
> always added before the local Hadoop jars in the classpath.
> Unfortunately there's no workaround for that at the moment (aside from
> the above built-time fix), although we might be able to do something
> when SPARK-2848 is fixed.
> 
> -- 
> Marcelo

Re: spark-submit with HA YARN

Reply via email to