Re: Spark on Kubernetes : unable to write files to HDFS

German Schiavon Wed, 16 Dec 2020 10:24:00 -0800

We all been there! no reason to be ashamed :)

On Wed, 16 Dec 2020 at 18:14, Loic DESCOTTE <
loic.desco...@kaizen-solutions.net> wrote:


> Oh thank you you're right!! I feel shameful 😄
>
> ------------------------------
> *De :* German Schiavon <gschiavonsp...@gmail.com>
> *Envoyé :* mercredi 16 décembre 2020 18:01
> *À :* Loic DESCOTTE <loic.desco...@kaizen-solutions.net>
> *Cc :* user@spark.apache.org <user@spark.apache.org>
> *Objet :* Re: Spark on Kubernetes : unable to write files to HDFS
>
> Hi,
>
> seems that you have a typo no?
>
> Exception in thread "main" java.io.IOException: No FileSystem for scheme:
> hfds
>
>   data.write.mode("overwrite").format("text").save("hfds://
> hdfs-namenode/user/loic/result.txt")
>
>
> On Wed, 16 Dec 2020 at 17:02, Loic DESCOTTE <
> loic.desco...@kaizen-solutions.net> wrote:
>
> So I've tried several other things, including building a fat jar with hdfs
> dependency inside my app jar, and added this to the Spark configuration in
> the code :
>
> val spark = SparkSession
>           .builder()
>           .appName("Hello Spark 7")
>           .config("fs.hdfs.impl", classOf[org.apache.hadoop.hdfs.
> DistributedFileSystem].getName)
>           .getOrCreate()
>
>
> But still the same error...
>
> ------------------------------
> *De :* Sean Owen <sro...@gmail.com>
> *Envoyé :* mercredi 16 décembre 2020 14:27
> *À :* Loic DESCOTTE <loic.desco...@kaizen-solutions.net>
> *Objet :* Re: Spark on Kubernetes : unable to write files to HDFS
>
> I think it'll have to be part of the Spark distro, but I'm not 100% sure.
> I also think these get registered via manifest files in the JARs; if some
> process is stripping those when creating a bundled up JAR, could be it.
> Could be that it's failing to initialize too for some reason.
>
> On Wed, Dec 16, 2020 at 7:24 AM Loic DESCOTTE <
> loic.desco...@kaizen-solutions.net> wrote:
>
> I've tried with this spark-submit option :
>
> --packages
> org.apache.hadoop:hadoop-client:2.6.5,org.apache.hadoop:hadoop-hdfs:2.6.5 \
>
> But it did't solve the issue.
> Should I add more jars?
>
> Thanks
> Loïc
> ------------------------------
> *De :* Sean Owen <sro...@gmail.com>
> *Envoyé :* mercredi 16 décembre 2020 14:20
> *À :* Loic DESCOTTE <loic.desco...@kaizen-solutions.net>
> *Objet :* Re: Spark on Kubernetes : unable to write files to HDFS
>
> Seems like your Spark cluster doesn't somehow have the Hadoop JARs?
>
> On Wed, Dec 16, 2020 at 6:45 AM Loic DESCOTTE <
> loic.desco...@kaizen-solutions.net> wrote:
>
> Hello,
>
> I am using Spark On Kubernetes and I have the following error when I try
> to write data on HDFS : "no filesystem for scheme hdfs"
>
> More details :
>
> I am submitting my application with Spark submit like this :
>
> spark-submit --master k8s://https://myK8SMaster:6443 \
> --deploy-mode cluster \
> --name hello-spark \
> --class Hello \
> --conf spark.executor.instances=2 \
> --conf spark.kubernetes.container.image.pullPolicy=IfNotPresent \
> --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
> --conf spark.kubernetes.container.image=gradiant/spark:2.4.4
> hdfs://hdfs-namenode/user/loic/jars/helloSpark.jar
>
> Then the driver and the 2 executors are created in K8S.
>
> But it fails when I look at the logs of the driver, I see this :
>
> Exception in thread "main" java.io.IOException: No FileSystem for scheme:
> hfds
> at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
> at
> org.apache.spark.sql.execution.datasources.DataSource.planForWritingFileFormat(DataSource.scala:424)
> at
> org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:524)
> at
> org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290)
> at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
> at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
> at Hello$.main(hello.scala:24)
> at Hello.main(hello.scala)
>
>
> As you can see , my application jar helloSpark.jar file is correctly
> loaded on HDFS by the Spark submit, but writing to HDFS fails.
>
> I have also tried to add the hadoop client dand hdfs dependencies in the
> spark submit command:
>
> --packages
> org.apache.hadoop:hadoop-client:2.6.5,org.apache.hadoop:hadoop-hdfs:2.6.5 \
>
> But the error is still here.
>
>
> Here is the Scala code of my application :
>
>
> import java.util.Calendar
>
> import org.apache.spark.sql.SparkSession
>
> case class Data(singleField: String)
>
> object Hello
> {
>     def main(args: Array[String])
>     {
>
>         val spark = SparkSession
>           .builder()
>           .appName("Hello Spark")
>           .getOrCreate()
>
>         import spark.implicits._
>
>         val now = Calendar.getInstance().getTime().toString
>         val data = List(Data(now)).toDF()
>
> data.write.mode("overwrite").format("text").save("hfds://hdfs-namenode/user/loic/result.txt")
>     }
> }
>
> Thanks for your help,
> Loïc
>
>

Re: Spark on Kubernetes : unable to write files to HDFS

Reply via email to