You can also put the GS access jar with your Spark jars — that’s what the class not found exception is pointing you towards.
On Fri, Feb 11, 2022 at 11:58 PM Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > BTW I also answered you in in stackoverflow : > > > https://stackoverflow.com/questions/71088934/unable-to-access-google-buckets-using-spark-submit > > HTH > > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Sat, 12 Feb 2022 at 08:24, Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > >> You are trying to access a Google storage bucket gs:// from your local >> host. >> >> It does not see it because spark-submit assumes that it is a local file >> system on the host which is not. >> >> You need to mount gs:// bucket as a local file system. >> >> You can use the tool called gcsfuse >> https://cloud.google.com/storage/docs/gcs-fuse . Cloud Storage FUSE is >> an open source FUSE <http://fuse.sourceforge.net/> adapter that allows >> you to mount Cloud Storage buckets as file systems on Linux or macOS >> systems. You can download gcsfuse from here >> <https://github.com/GoogleCloudPlatform/gcsfuse> >> >> >> Pretty simple. >> >> >> It will be installed as /usr/bin/gcsfuse and you can mount it by creating >> a local mount file like /mnt/gs as root and give permission to others to >> use it. >> >> >> As a normal user that needs to access gs:// bucket (not as root), use >> gcsfuse to mount it. For example I am mounting a gcs bucket called >> spark-jars-karan here >> >> >> Just use the bucket name itself >> >> >> gcsfuse spark-jars-karan /mnt/gs >> >> >> Then you can refer to it as /mnt/gs in spark-submit from on-premise host >> >> spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.2.0 >> --jars /mnt/gs/spark-bigquery-with-dependencies_2.12-0.23.2.jar >> >> HTH >> >> view my Linkedin profile >> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >> >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> >> >> >> On Sat, 12 Feb 2022 at 04:31, karan alang <karan.al...@gmail.com> wrote: >> >>> Hello All, >>> >>> I'm trying to access gcp buckets while running spark-submit from local, >>> and running into issues. >>> >>> I'm getting error : >>> ``` >>> >>> 22/02/11 20:06:59 WARN NativeCodeLoader: Unable to load native-hadoop >>> library for your platform... using builtin-java classes where applicable >>> Exception in thread "main" >>> org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for >>> scheme "gs" >>> >>> ``` >>> I tried adding the --conf >>> spark.hadoop.fs.gs.impl=com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem >>> >>> to the spark-submit command, but getting ClassNotFoundException >>> >>> Details are in stackoverflow : >>> >>> https://stackoverflow.com/questions/71088934/unable-to-access-google-buckets-using-spark-submit >>> >>> Any ideas on how to fix this ? >>> tia ! >>> >>> -- Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> YouTube Live Streams: https://www.youtube.com/user/holdenkarau