Hi,

I have a PySpark job that uses a Kubernetes cluster (GKE) on Google cloud.

The Spark image was built on PySpark 3.1.2, Scala 3.7, Java 8 as below with
some Python packages

*3.1.2*-scala_2.12-8-jre-slim-buster-addedpackages

When run it throws this error

21/08/28 07:43:18 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
Exception in thread "main"
org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for
scheme "gs"
        at
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3281)


However, it runs fine with the following Docker version built on PySpark
3.1.1

*3.1.1*-scala_2.12-8-jre-slim-buster-addedpackages

This the the Spark-submit job for both


        spark-submit --verbose \
           --properties-file ${property_file} \
           --master k8s://https://$KUBERNETES_MASTER_IP:443 \
           --deploy-mode cluster \
           --name pytest \
           --conf
spark.yarn.appMasterEnv.PYSPARK_PYTHON=./pyspark_venv/bin/python \
           --py-files $CODE_DIRECTORY/DSBQ.zip \
           --conf spark.kubernetes.namespace=$NAMESPACE \
           --conf spark.executor.memory=5000m \
           --conf spark.network.timeout=300 \
           --conf spark.executor.instances=3 \
           --conf spark.kubernetes.driver.limit.cores=1 \
           --conf spark.driver.cores=1 \
           --conf spark.executor.cores=1 \
           --conf spark.executor.memory=2000m \
           --conf spark.kubernetes.driver.docker.image=${IMAGEGCP} \
           --conf spark.kubernetes.executor.docker.image=${IMAGEGCP} \
           --conf spark.kubernetes.container.image=${IMAGEGCP} \
           --conf
spark.kubernetes.authenticate.driver.serviceAccountName=spark-bq \
           --conf
spark.driver.extraJavaOptions="-Dio.netty.tryReflectionSetAccessible=true" \
           --conf
spark.executor.extraJavaOptions="-Dio.netty.tryReflectionSetAccessible=true"
\
           --conf spark.sql.execution.arrow.pyspark.enabled="true" \
           $CODE_DIRECTORY/${APPLICATION}


Has there been any changes in 3.1.2 that could cause this issue?


It runs fine with PySpark 3.1.1


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Reply via email to