Hi Nick, You should look which spark version is "latest", understand which Hadoop version was built "spark:latest" on top, and then check the compatibility of Hadoop with the Azure libraries. In the past, I used the following Dockerfile to experiment:
FROM gcr.io/spark-operator/spark:v3.0.0 USER root ADD https://repo1.maven.org/maven2/com/microsoft/azure/azure-storage/2.0.0/azure-storage-2.0.0.jar /opt/spark/jars ADD https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-azure/2.7.7/hadoop-azure-2.7.7.jar /opt/spark/jars ADD https://repo1.maven.org/maven2/com/azure/azure-storage-blob/12.8.0/azure-storage-blob-12.8.0.jar /opt/spark/jars ADD https://repo1.maven.org/maven2/com/azure/azure-storage-common/12.8.0/azure-storage-common-12.8.0.jar /opt/spark/jars And the following properties: spark.hadoop.fs.wasb.impl org.apache.hadoop.fs.azure.NativeAzureFileSystem spark.hadoop.fs.AbstractFileSystem.wasb.impl org.apache.hadoop.fs.azure.Wasb Good luck, Pol Santamaria On Fri, Apr 16, 2021 at 3:40 PM Nick Stenroos-Dam <n...@project.bi> wrote: > Hello > > > > I am trying to load the Hadoop-Azure driver in Apache Spark, but so far I > have failed. > > The plan is to include the required files in the docker image, as we plan > on using a Client-mode SparkSession. > > > > My current Dockerfile looks like this: > ------------------------------ > > FROM spark:latest > > > > COPY *.jar $SPARK_HOME/jars > > > > ENV > SPARK_EXTRA_CLASSPATH="$SPARK_HOME/jars/hadoop-azure-3.2.0.jar:$SPARK_HOME/jars/azure-keyvault-core-1.2.4.jar:$SPARK_HOME/jars/azure-storage-8.6.6.jar:$SPARK_HOME/jars/azure-storage-8.6.6.jar:$SPARK_HOME/jars/jetty-util-ajax-9.3.24.v20180605.jar:$SPARK_HOME/jars/wildfly-openssl-2.1.3.Final.jar" > > ENV HADOOP_OPTIONAL_TOOLS="hadoop-azure,hadoop-azure-datalake" > ------------------------------ > > > > In the directory I have the following dependencies: > > hadoop-azure-3.2.0.jar > > azure-storage-8.6.6.jar > > azure-keyvault-core-1.2.4.jar > > jetty-util-ajax-9.3.24.v20180605.jar > > wildfly-openssl-2.1.3.Final.jar > > > > (I have validated that these files are part of the image and located where > I expect (/opt/spark/jars)) > > > > When looking in the Spark UI – Environment, I can’t see that Hadoop-azure > should be loaded. > > In addition when I try and read a file using the wasb:// schema, I get the > following error: > > java.lang.classnotfoundexception: Class > org.apache.hadoop.fs.azure.NativeAzureFileSystem not found >