Hi Christophe,

Adding the jars to both SPARK_CLASSPATH and ADD_JARS is required.  The
former makes them available to the spark-shell driver process, and the
latter tells Spark to make them available to the executor processes running
on the cluster.

-Sandy


On Wed, Apr 16, 2014 at 9:27 AM, Christophe Préaud <
christophe.pre...@kelkoo.com> wrote:

> Hi,
>
> I am running Spark 0.9.1 on a YARN cluster, and I am wondering which is the
> correct way to add external jars when running a spark shell on a YARN
> cluster.
>
> Packaging all this dependencies in an assembly which path is then set in
> SPARK_YARN_APP_JAR (as written in the doc:
> http://spark.apache.org/docs/latest/running-on-yarn.html) does not work
> in my
> case: it pushes the jar on HDFS in .sparkStaging/application_XXX, but the
> spark-shell is still unable to find it (unless ADD_JARS and/or
> SPARK_CLASSPATH
> is defined)
>
> Defining all the dependencies (either in an assembly, or separately) in
> ADD_JARS
> or SPARK_CLASSPATH works (even if SPARK_YARN_APP_JAR is set to /dev/null),
> but
> defining some dependencies in ADD_JARS and the rest in SPARK_CLASSPATH
> does not!
>
> Hence I'm still wondering which are the differences between ADD_JARS and
> SPARK_CLASSPATH, and the purpose of SPARK_YARN_APP_JAR.
>
> Thanks for any insights!
> Christophe.
>
>
>
> Kelkoo SAS
> Société par Actions Simplifiée
> Au capital de EURO 4.168.964,30
> Siège social : 8, rue du Sentier 75002 Paris
> 425 093 069 RCS Paris
>
> Ce message et les pièces jointes sont confidentiels et établis à
> l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
> destinataire de ce message, merci de le détruire et d'en avertir
> l'expéditeur.
>

Reply via email to