Hi Christophe, Adding the jars to both SPARK_CLASSPATH and ADD_JARS is required. The former makes them available to the spark-shell driver process, and the latter tells Spark to make them available to the executor processes running on the cluster.
-Sandy On Wed, Apr 16, 2014 at 9:27 AM, Christophe Préaud < christophe.pre...@kelkoo.com> wrote: > Hi, > > I am running Spark 0.9.1 on a YARN cluster, and I am wondering which is the > correct way to add external jars when running a spark shell on a YARN > cluster. > > Packaging all this dependencies in an assembly which path is then set in > SPARK_YARN_APP_JAR (as written in the doc: > http://spark.apache.org/docs/latest/running-on-yarn.html) does not work > in my > case: it pushes the jar on HDFS in .sparkStaging/application_XXX, but the > spark-shell is still unable to find it (unless ADD_JARS and/or > SPARK_CLASSPATH > is defined) > > Defining all the dependencies (either in an assembly, or separately) in > ADD_JARS > or SPARK_CLASSPATH works (even if SPARK_YARN_APP_JAR is set to /dev/null), > but > defining some dependencies in ADD_JARS and the rest in SPARK_CLASSPATH > does not! > > Hence I'm still wondering which are the differences between ADD_JARS and > SPARK_CLASSPATH, and the purpose of SPARK_YARN_APP_JAR. > > Thanks for any insights! > Christophe. > > > > Kelkoo SAS > Société par Actions Simplifiée > Au capital de EURO 4.168.964,30 > Siège social : 8, rue du Sentier 75002 Paris > 425 093 069 RCS Paris > > Ce message et les pièces jointes sont confidentiels et établis à > l'attention exclusive de leurs destinataires. Si vous n'êtes pas le > destinataire de ce message, merci de le détruire et d'en avertir > l'expéditeur. >