Spark Standalone with SPARK_CLASSPATH in spark-env.sh and "spark.driver.userClassPathFirst"

Yong Zhang Fri, 15 Apr 2016 11:58:25 -0700


Hi, 
I found out one problem of using "spark.driver.userClassPathFirst" and 
SPARK_CLASSPATH in spark-env.sh on Standalone environment, and want to confirm 
this in fact has no good solution.
We are running Spark 1.5.2 in standalone mode on a cluster. Since the cluster 
doesn't have the direct internet access, so we always add additional common 
jars in the SPARK_CLASSPATH of spark-env.sh file, so they are available in the 
Spark client and executor by default, like "spark-avro, 
spark-cassandra-connector, commons-csv, spark-csv" etc.
We pick these jars carefully, so they all works fine with Spark 1.5.2 version, 
without any issue.
But the problem comes when any user want to use Spark shell with their own 
jars, and these jars contains compatible issue with above jars, we got a 
problem on the driver side.
The user will start the spark shell in the following way as I suggested:
/opt/spark/bin/spark-shell ..................... --conf 
spark.driver.userClassPathFirst=true --conf 
spark.executor.userClassPathFirst=true
But this still will get "java.lang.NoSuchMethodError" error, which I understand 
the reason due to version mismatch.
What I don't understand is on the driver end, if I check the driver JVM command:
 ps -ef | grep spark
I can clearly see the processing of the spark shell as this way:
/opt/java8/bin/java -cp 
all_jars_specified_in_SPARK_CLASSPATH:all_jars_under_opt_spark_lib --conf 
spark.executor.userClassPathFirst=true --conf 
spark.driver.userClassPathFirst=true --jars end_user_supply_jars spark-shell
If spark-shell (or the driver) run in the JVM in the above way, what is the 
point of configuration of "spark.driver.userClassPathFirst"? There is no way 
driver can control the classpath, allow end user supply jar through "--jars" to 
override any spark jars. 
Am I misunderstand the meaning of "spark.driver.userClassPathFirst", or it is 
not possible to overwrite any class in /opt/spark/lib and SPARK_CLASSPATH? Then 
what is the usage of "spark.driver.userClassPathFirst" for?
Yong

Spark Standalone with SPARK_CLASSPATH in spark-env.sh and "spark.driver.userClassPathFirst"

Reply via email to