
I have a PySpark job that uses a Kubernetes cluster (GKE) on Google cloud.

The Spark image was built on PySpark 3.1.2, Scala 3.7, Java 8 as below with
some Python packages


When run it throws this error

21/08/28 07:43:18 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
Exception in thread "main"
org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for
scheme "gs"

However, it runs fine with the following Docker version built on PySpark


This the the Spark-submit job for both

        spark-submit --verbose \
           --properties-file ${property_file} \
           --master k8s://https://$KUBERNETES_MASTER_IP:443 \
           --deploy-mode cluster \
           --name pytest \
spark.yarn.appMasterEnv.PYSPARK_PYTHON=./pyspark_venv/bin/python \
           --py-files $CODE_DIRECTORY/DSBQ.zip \
           --conf spark.kubernetes.namespace=$NAMESPACE \
           --conf spark.executor.memory=5000m \
           --conf spark.network.timeout=300 \
           --conf spark.executor.instances=3 \
           --conf spark.kubernetes.driver.limit.cores=1 \
           --conf spark.driver.cores=1 \
           --conf spark.executor.cores=1 \
           --conf spark.executor.memory=2000m \
           --conf spark.kubernetes.driver.docker.image=${IMAGEGCP} \
           --conf spark.kubernetes.executor.docker.image=${IMAGEGCP} \
           --conf spark.kubernetes.container.image=${IMAGEGCP} \
spark.kubernetes.authenticate.driver.serviceAccountName=spark-bq \
spark.driver.extraJavaOptions="-Dio.netty.tryReflectionSetAccessible=true" \
           --conf spark.sql.execution.arrow.pyspark.enabled="true" \

Has there been any changes in 3.1.2 that could cause this issue?

It runs fine with PySpark 3.1.1

