Hi Gaurav, All, I'm doing a spark-submit from my local system to a GCP Dataproc cluster .. This is more for dev/testing. I can run a -- 'gcloud dataproc jobs submit' command as well, which is what will be done in Production.
Hope that clarifies. regds, Karan Alang On Sat, Feb 12, 2022 at 10:31 PM Gourav Sengupta <gourav.sengu...@gmail.com> wrote: > Hi, > > agree with Holden, have faced quite a few issues with FUSE. > > Also trying to understand "spark-submit from local" . Are you submitting > your SPARK jobs from a local laptop or in local mode from a GCP dataproc / > system? > > If you are submitting the job from your local laptop, there will be > performance bottlenecks I guess based on the internet bandwidth and volume > of data. > > Regards, > Gourav > > > On Sat, Feb 12, 2022 at 7:12 PM Holden Karau <hol...@pigscanfly.ca> wrote: > >> You can also put the GS access jar with your Spark jars — that’s what the >> class not found exception is pointing you towards. >> >> On Fri, Feb 11, 2022 at 11:58 PM Mich Talebzadeh < >> mich.talebza...@gmail.com> wrote: >> >>> BTW I also answered you in in stackoverflow : >>> >>> >>> https://stackoverflow.com/questions/71088934/unable-to-access-google-buckets-using-spark-submit >>> >>> HTH >>> >>> >>> view my Linkedin profile >>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>> >>> >>> https://en.everybodywiki.com/Mich_Talebzadeh >>> >>> >>> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>> any loss, damage or destruction of data or any other property which may >>> arise from relying on this email's technical content is explicitly >>> disclaimed. The author will in no case be liable for any monetary damages >>> arising from such loss, damage or destruction. >>> >>> >>> >>> >>> On Sat, 12 Feb 2022 at 08:24, Mich Talebzadeh <mich.talebza...@gmail.com> >>> wrote: >>> >>>> You are trying to access a Google storage bucket gs:// from your local >>>> host. >>>> >>>> It does not see it because spark-submit assumes that it is a local file >>>> system on the host which is not. >>>> >>>> You need to mount gs:// bucket as a local file system. >>>> >>>> You can use the tool called gcsfuse >>>> https://cloud.google.com/storage/docs/gcs-fuse . Cloud Storage FUSE is >>>> an open source FUSE <http://fuse.sourceforge.net/> adapter that allows >>>> you to mount Cloud Storage buckets as file systems on Linux or macOS >>>> systems. You can download gcsfuse from here >>>> <https://github.com/GoogleCloudPlatform/gcsfuse> >>>> >>>> >>>> Pretty simple. >>>> >>>> >>>> It will be installed as /usr/bin/gcsfuse and you can mount it by >>>> creating a local mount file like /mnt/gs as root and give permission to >>>> others to use it. >>>> >>>> >>>> As a normal user that needs to access gs:// bucket (not as root), use >>>> gcsfuse to mount it. For example I am mounting a gcs bucket called >>>> spark-jars-karan here >>>> >>>> >>>> Just use the bucket name itself >>>> >>>> >>>> gcsfuse spark-jars-karan /mnt/gs >>>> >>>> >>>> Then you can refer to it as /mnt/gs in spark-submit from on-premise host >>>> >>>> spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.2.0 >>>> --jars /mnt/gs/spark-bigquery-with-dependencies_2.12-0.23.2.jar >>>> >>>> HTH >>>> >>>> view my Linkedin profile >>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>> >>>> >>>> >>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>>> any loss, damage or destruction of data or any other property which may >>>> arise from relying on this email's technical content is explicitly >>>> disclaimed. The author will in no case be liable for any monetary damages >>>> arising from such loss, damage or destruction. >>>> >>>> >>>> >>>> >>>> On Sat, 12 Feb 2022 at 04:31, karan alang <karan.al...@gmail.com> >>>> wrote: >>>> >>>>> Hello All, >>>>> >>>>> I'm trying to access gcp buckets while running spark-submit from >>>>> local, and running into issues. >>>>> >>>>> I'm getting error : >>>>> ``` >>>>> >>>>> 22/02/11 20:06:59 WARN NativeCodeLoader: Unable to load native-hadoop >>>>> library for your platform... using builtin-java classes where applicable >>>>> Exception in thread "main" >>>>> org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for >>>>> scheme "gs" >>>>> >>>>> ``` >>>>> I tried adding the --conf >>>>> spark.hadoop.fs.gs.impl=com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem >>>>> >>>>> to the spark-submit command, but getting ClassNotFoundException >>>>> >>>>> Details are in stackoverflow : >>>>> >>>>> https://stackoverflow.com/questions/71088934/unable-to-access-google-buckets-using-spark-submit >>>>> >>>>> Any ideas on how to fix this ? >>>>> tia ! >>>>> >>>>> -- >> Twitter: https://twitter.com/holdenkarau >> Books (Learning Spark, High Performance Spark, etc.): >> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >> >