Hello

I am trying to load the Hadoop-Azure driver in Apache Spark, but so far I have 
failed.
The plan is to include the required files in the docker image, as we plan on 
using a Client-mode SparkSession.

My current Dockerfile looks like this:
________________________________
FROM spark:latest

COPY *.jar $SPARK_HOME/jars

ENV 
SPARK_EXTRA_CLASSPATH="$SPARK_HOME/jars/hadoop-azure-3.2.0.jar:$SPARK_HOME/jars/azure-keyvault-core-1.2.4.jar:$SPARK_HOME/jars/azure-storage-8.6.6.jar:$SPARK_HOME/jars/azure-storage-8.6.6.jar:$SPARK_HOME/jars/jetty-util-ajax-9.3.24.v20180605.jar:$SPARK_HOME/jars/wildfly-openssl-2.1.3.Final.jar"
ENV HADOOP_OPTIONAL_TOOLS="hadoop-azure,hadoop-azure-datalake"
________________________________

In the directory I have the following dependencies:
hadoop-azure-3.2.0.jar
azure-storage-8.6.6.jar
azure-keyvault-core-1.2.4.jar
jetty-util-ajax-9.3.24.v20180605.jar
wildfly-openssl-2.1.3.Final.jar

(I have validated that these files are part of the image and located where I 
expect (/opt/spark/jars))

When looking in the Spark UI - Environment, I can't see that Hadoop-azure 
should be loaded.
In addition when I try and read a file using the wasb:// schema, I get the 
following error:
java.lang.classnotfoundexception: Class 
org.apache.hadoop.fs.azure.NativeAzureFileSystem not found

Reply via email to