Everything is working fine now 🙂 Thanks again Loïc ________________________________ De : German Schiavon <gschiavonsp...@gmail.com> Envoyé : mercredi 16 décembre 2020 19:23 À : Loic DESCOTTE <loic.desco...@kaizen-solutions.net> Cc : user@spark.apache.org <user@spark.apache.org> Objet : Re: Spark on Kubernetes : unable to write files to HDFS
We all been there! no reason to be ashamed :) On Wed, 16 Dec 2020 at 18:14, Loic DESCOTTE <loic.desco...@kaizen-solutions.net<mailto:loic.desco...@kaizen-solutions.net>> wrote: Oh thank you you're right!! I feel shameful 😄 ________________________________ De : German Schiavon <gschiavonsp...@gmail.com<mailto:gschiavonsp...@gmail.com>> Envoyé : mercredi 16 décembre 2020 18:01 À : Loic DESCOTTE <loic.desco...@kaizen-solutions.net<mailto:loic.desco...@kaizen-solutions.net>> Cc : user@spark.apache.org<mailto:user@spark.apache.org> <user@spark.apache.org<mailto:user@spark.apache.org>> Objet : Re: Spark on Kubernetes : unable to write files to HDFS Hi, seems that you have a typo no? Exception in thread "main" java.io.IOException: No FileSystem for scheme: hfds data.write.mode("overwrite").format("text").save("hfds://hdfs-namenode/user/loic/result.txt") On Wed, 16 Dec 2020 at 17:02, Loic DESCOTTE <loic.desco...@kaizen-solutions.net<mailto:loic.desco...@kaizen-solutions.net>> wrote: So I've tried several other things, including building a fat jar with hdfs dependency inside my app jar, and added this to the Spark configuration in the code : val spark = SparkSession .builder() .appName("Hello Spark 7") .config("fs.hdfs.impl", classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getName) .getOrCreate() But still the same error... ________________________________ De : Sean Owen <sro...@gmail.com<mailto:sro...@gmail.com>> Envoyé : mercredi 16 décembre 2020 14:27 À : Loic DESCOTTE <loic.desco...@kaizen-solutions.net<mailto:loic.desco...@kaizen-solutions.net>> Objet : Re: Spark on Kubernetes : unable to write files to HDFS I think it'll have to be part of the Spark distro, but I'm not 100% sure. I also think these get registered via manifest files in the JARs; if some process is stripping those when creating a bundled up JAR, could be it. Could be that it's failing to initialize too for some reason. On Wed, Dec 16, 2020 at 7:24 AM Loic DESCOTTE <loic.desco...@kaizen-solutions.net<mailto:loic.desco...@kaizen-solutions.net>> wrote: I've tried with this spark-submit option : --packages org.apache.hadoop:hadoop-client:2.6.5,org.apache.hadoop:hadoop-hdfs:2.6.5 \ But it did't solve the issue. Should I add more jars? Thanks Loïc ________________________________ De : Sean Owen <sro...@gmail.com<mailto:sro...@gmail.com>> Envoyé : mercredi 16 décembre 2020 14:20 À : Loic DESCOTTE <loic.desco...@kaizen-solutions.net<mailto:loic.desco...@kaizen-solutions.net>> Objet : Re: Spark on Kubernetes : unable to write files to HDFS Seems like your Spark cluster doesn't somehow have the Hadoop JARs? On Wed, Dec 16, 2020 at 6:45 AM Loic DESCOTTE <loic.desco...@kaizen-solutions.net<mailto:loic.desco...@kaizen-solutions.net>> wrote: Hello, I am using Spark On Kubernetes and I have the following error when I try to write data on HDFS : "no filesystem for scheme hdfs" More details : I am submitting my application with Spark submit like this : spark-submit --master k8s://https://myK8SMaster:6443 \ --deploy-mode cluster \ --name hello-spark \ --class Hello \ --conf spark.executor.instances=2 \ --conf spark.kubernetes.container.image.pullPolicy=IfNotPresent \ --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \ --conf spark.kubernetes.container.image=gradiant/spark:2.4.4 hdfs://hdfs-namenode/user/loic/jars/helloSpark.jar Then the driver and the 2 executors are created in K8S. But it fails when I look at the logs of the driver, I see this : Exception in thread "main" java.io.IOException: No FileSystem for scheme: hfds at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) at org.apache.spark.sql.execution.datasources.DataSource.planForWritingFileFormat(DataSource.scala:424) at org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:524) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229) at Hello$.main(hello.scala:24) at Hello.main(hello.scala) As you can see , my application jar helloSpark.jar file is correctly loaded on HDFS by the Spark submit, but writing to HDFS fails. I have also tried to add the hadoop client dand hdfs dependencies in the spark submit command: --packages org.apache.hadoop:hadoop-client:2.6.5,org.apache.hadoop:hadoop-hdfs:2.6.5 \ But the error is still here. Here is the Scala code of my application : import java.util.Calendar import org.apache.spark.sql.SparkSession case class Data(singleField: String) object Hello { def main(args: Array[String]) { val spark = SparkSession .builder() .appName("Hello Spark") .getOrCreate() import spark.implicits._ val now = Calendar.getInstance().getTime().toString val data = List(Data(now)).toDF() data.write.mode("overwrite").format("text").save("hfds://hdfs-namenode/user/loic/result.txt") } } Thanks for your help, Loïc