Hi,

I am running Spark 0.9.1 on a YARN cluster, and I am wondering which is the
correct way to add external jars when running a spark shell on a YARN cluster.

Packaging all this dependencies in an assembly which path is then set in
SPARK_YARN_APP_JAR (as written in the doc:
http://spark.apache.org/docs/latest/running-on-yarn.html) does not work in my
case: it pushes the jar on HDFS in .sparkStaging/application_XXX, but the
spark-shell is still unable to find it (unless ADD_JARS and/or SPARK_CLASSPATH
is defined)

Defining all the dependencies (either in an assembly, or separately) in ADD_JARS
or SPARK_CLASSPATH works (even if SPARK_YARN_APP_JAR is set to /dev/null), but
defining some dependencies in ADD_JARS and the rest in SPARK_CLASSPATH does not!

Hence I'm still wondering which are the differences between ADD_JARS and
SPARK_CLASSPATH, and the purpose of SPARK_YARN_APP_JAR.

Thanks for any insights!
Christophe.



Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.

Reply via email to