We all been there! no reason to be ashamed :) On Wed, 16 Dec 2020 at 18:14, Loic DESCOTTE < loic.desco...@kaizen-solutions.net> wrote:
> Oh thank you you're right!! I feel shameful 😄 > > ------------------------------ > *De :* German Schiavon <gschiavonsp...@gmail.com> > *Envoyé :* mercredi 16 décembre 2020 18:01 > *À :* Loic DESCOTTE <loic.desco...@kaizen-solutions.net> > *Cc :* user@spark.apache.org <user@spark.apache.org> > *Objet :* Re: Spark on Kubernetes : unable to write files to HDFS > > Hi, > > seems that you have a typo no? > > Exception in thread "main" java.io.IOException: No FileSystem for scheme: > hfds > > data.write.mode("overwrite").format("text").save("hfds:// > hdfs-namenode/user/loic/result.txt") > > > On Wed, 16 Dec 2020 at 17:02, Loic DESCOTTE < > loic.desco...@kaizen-solutions.net> wrote: > > So I've tried several other things, including building a fat jar with hdfs > dependency inside my app jar, and added this to the Spark configuration in > the code : > > val spark = SparkSession > .builder() > .appName("Hello Spark 7") > .config("fs.hdfs.impl", classOf[org.apache.hadoop.hdfs. > DistributedFileSystem].getName) > .getOrCreate() > > > But still the same error... > > ------------------------------ > *De :* Sean Owen <sro...@gmail.com> > *Envoyé :* mercredi 16 décembre 2020 14:27 > *À :* Loic DESCOTTE <loic.desco...@kaizen-solutions.net> > *Objet :* Re: Spark on Kubernetes : unable to write files to HDFS > > I think it'll have to be part of the Spark distro, but I'm not 100% sure. > I also think these get registered via manifest files in the JARs; if some > process is stripping those when creating a bundled up JAR, could be it. > Could be that it's failing to initialize too for some reason. > > On Wed, Dec 16, 2020 at 7:24 AM Loic DESCOTTE < > loic.desco...@kaizen-solutions.net> wrote: > > I've tried with this spark-submit option : > > --packages > org.apache.hadoop:hadoop-client:2.6.5,org.apache.hadoop:hadoop-hdfs:2.6.5 \ > > But it did't solve the issue. > Should I add more jars? > > Thanks > Loïc > ------------------------------ > *De :* Sean Owen <sro...@gmail.com> > *Envoyé :* mercredi 16 décembre 2020 14:20 > *À :* Loic DESCOTTE <loic.desco...@kaizen-solutions.net> > *Objet :* Re: Spark on Kubernetes : unable to write files to HDFS > > Seems like your Spark cluster doesn't somehow have the Hadoop JARs? > > On Wed, Dec 16, 2020 at 6:45 AM Loic DESCOTTE < > loic.desco...@kaizen-solutions.net> wrote: > > Hello, > > I am using Spark On Kubernetes and I have the following error when I try > to write data on HDFS : "no filesystem for scheme hdfs" > > More details : > > I am submitting my application with Spark submit like this : > > spark-submit --master k8s://https://myK8SMaster:6443 \ > --deploy-mode cluster \ > --name hello-spark \ > --class Hello \ > --conf spark.executor.instances=2 \ > --conf spark.kubernetes.container.image.pullPolicy=IfNotPresent \ > --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \ > --conf spark.kubernetes.container.image=gradiant/spark:2.4.4 > hdfs://hdfs-namenode/user/loic/jars/helloSpark.jar > > Then the driver and the 2 executors are created in K8S. > > But it fails when I look at the logs of the driver, I see this : > > Exception in thread "main" java.io.IOException: No FileSystem for scheme: > hfds > at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660) > at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94) > at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373) > at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) > at > org.apache.spark.sql.execution.datasources.DataSource.planForWritingFileFormat(DataSource.scala:424) > at > org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:524) > at > org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229) > at Hello$.main(hello.scala:24) > at Hello.main(hello.scala) > > > As you can see , my application jar helloSpark.jar file is correctly > loaded on HDFS by the Spark submit, but writing to HDFS fails. > > I have also tried to add the hadoop client dand hdfs dependencies in the > spark submit command: > > --packages > org.apache.hadoop:hadoop-client:2.6.5,org.apache.hadoop:hadoop-hdfs:2.6.5 \ > > But the error is still here. > > > Here is the Scala code of my application : > > > import java.util.Calendar > > import org.apache.spark.sql.SparkSession > > case class Data(singleField: String) > > object Hello > { > def main(args: Array[String]) > { > > val spark = SparkSession > .builder() > .appName("Hello Spark") > .getOrCreate() > > import spark.implicits._ > > val now = Calendar.getInstance().getTime().toString > val data = List(Data(now)).toDF() > > data.write.mode("overwrite").format("text").save("hfds://hdfs-namenode/user/loic/result.txt") > } > } > > Thanks for your help, > Loïc > >