Hi,
I found out one problem of using "spark.driver.userClassPathFirst" and
SPARK_CLASSPATH in spark-env.sh on Standalone environment, and want to confirm
this in fact has no good solution.
We are running Spark 1.5.2 in standalone mode on a cluster. Since the cluster
doesn't have the direct internet access, so we always add additional common
jars in the SPARK_CLASSPATH of spark-env.sh file, so they are available in the
Spark client and executor by default, like "spark-avro,
spark-cassandra-connector, commons-csv, spark-csv" etc.
We pick these jars carefully, so they all works fine with Spark 1.5.2 version,
without any issue.
But the problem comes when any user want to use Spark shell with their own
jars, and these jars contains compatible issue with above jars, we got a
problem on the driver side.
The user will start the spark shell in the following way as I suggested:
/opt/spark/bin/spark-shell ..................... --conf
spark.driver.userClassPathFirst=true --conf
spark.executor.userClassPathFirst=true
But this still will get "java.lang.NoSuchMethodError" error, which I understand
the reason due to version mismatch.
What I don't understand is on the driver end, if I check the driver JVM command:
ps -ef | grep spark
I can clearly see the processing of the spark shell as this way:
/opt/java8/bin/java -cp
all_jars_specified_in_SPARK_CLASSPATH:all_jars_under_opt_spark_lib --conf
spark.executor.userClassPathFirst=true --conf
spark.driver.userClassPathFirst=true --jars end_user_supply_jars spark-shell
If spark-shell (or the driver) run in the JVM in the above way, what is the
point of configuration of "spark.driver.userClassPathFirst"? There is no way
driver can control the classpath, allow end user supply jar through "--jars" to
override any spark jars.
Am I misunderstand the meaning of "spark.driver.userClassPathFirst", or it is
not possible to overwrite any class in /opt/spark/lib and SPARK_CLASSPATH? Then
what is the usage of "spark.driver.userClassPathFirst" for?
Yong