Re: yarn SPARK_CLASSPATH

Tom Graves Mon, 13 Jan 2014 15:25:15 -0800

I'm assuming you actually installed the jar on all the yarn clusters then?

In general this isn't a good idea on yarn as most users don't have permissions 
to install things on the nodes themselves.  The idea is Yarn provides a certain 
set of jars which really should be just the yarn/hadoop framework,  it adds 
those to your classpath and the user provides everything else application 
specific when they submit their application and those get distributed with the 
app and added to the classpath.   If you are worried about it being downloaded 
everytime, you can use the public distributed cache on yarn as a way to 
distribute it and share it.  It will only be removed from that nodes 
distributed cache if other applications need that space.


That said what yarn adds to the classpath is configurable via the hadoop 
configuration file yarn-site.xml, config name: yarn.application.classpath.  So 
you can change the config to add it, but it will be added for all types of 
applications. 

You can use the --files and --archives options in yarn-standalone mode to use 
the distributed cache.  To make it public, make sure permissions on the file 
are set appropriately.

Tom



On Monday, January 13, 2014 3:49 PM, Eric Kimbrel <lekimb...@gmail.com> wrote:
 
Is there any extra trick required to use jars on the SPARK_CLASSPATH when 
running spark on yarn?

I have several jars added to the SPARK_CLASSPATH in spark_env.sh   When my job 
runs i print the SPARK_CLASSPATH so i can see that the jars were added to the 
environment that the app master is running in, however even though the jars are 
on the class path I continue to get class not found errors.

I have also tried setting SPARK_CLASSPATH via SPARK_YARN_USER_ENV

Re: yarn SPARK_CLASSPATH

Reply via email to