Hi Mich, It's not specific to ORC, and looks like a bug from Hadoop Common project. I have raised a bug and am happy to contribute to Hadoop 3.3.0 version. Do you know if anyone could help me to set the Assignee? https://issues.apache.org/jira/browse/HADOOP-18856
With Best Regards, Dipayan Dev On Sun, Aug 20, 2023 at 2:47 AM Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Under gs directory > > "gs://test_dd1/abc/" > > What do you see? > > gsutil ls gs://test_dd1/abc > > and the same > > gs://test_dd1/ > > gsutil ls gs://test_dd1 > > I suspect you need a folder for multiple ORC slices! > > > > Mich Talebzadeh, > Solutions Architect/Engineering Lead > London > United Kingdom > > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Sat, 19 Aug 2023 at 21:36, Dipayan Dev <dev.dipaya...@gmail.com> wrote: > >> Hi Everyone, >> >> I'm stuck with one problem, where I need to provide a custom GCS location >> for the Hive table from Spark. The code fails while doing an *'insert >> into'* whenever my Hive table has a flag GS location like >> gs://<bucket_name>, but works for nested locations like >> gs://bucket_name/blob_name. >> >> Is anyone aware if it's an issue from Spark side or any config I need to >> pass for it? >> >> *The issue is happening in 2.x and 3.x both.* >> >> Config using: >> >> spark.conf.set("spark.hadoop.hive.exec.dynamic.partition.mode", "nonstrict") >> spark.conf.set("spark.hadoop.hive.exec.dynamic.partition", true) >> spark.conf.set("hive.exec.dynamic.partition.mode","nonstrict") >> spark.conf.set("hive.exec.dynamic.partition", true) >> >> >> *Case 1 : FAILS* >> >> val DF = Seq(("test1", 123)).toDF("name", "num") >> val partKey = List("num").map(x => x) >> >> DF.write.option("path", >> "gs://test_dd1/").mode(SaveMode.Overwrite).partitionBy(partKey: >> _*).format("orc").saveAsTable("us_wm_supply_chain_otif_stg.test_tb1") >> >> val DF1 = Seq(("test2", 125)).toDF("name", "num") >> DF.write.mode(SaveMode.Overwrite).format("orc").insertInto("us_wm_supply_chain_otif_stg.test_tb1") >> >> >> >> >> >> *java.lang.NullPointerException at >> org.apache.hadoop.fs.Path.<init>(Path.java:141) at >> org.apache.hadoop.fs.Path.<init>(Path.java:120) at >> org.apache.hadoop.fs.Path.suffix(Path.java:441) at >> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.$anonfun$getCustomPartitionLocations$1(InsertIntoHadoopFsRelationCommand.scala:254)* >> >> >> *Case 2: Succeeds * >> >> val DF = Seq(("test1", 123)).toDF("name", "num") >> val partKey = List("num").map(x => x) >> >> DF.write.option("path", >> "gs://test_dd1/abc/").mode(SaveMode.Overwrite).partitionBy(partKey: >> _*).format("orc").saveAsTable("us_wm_supply_chain_otif_stg.test_tb2") >> >> val DF1 = Seq(("test2", 125)).toDF("name", "num") >> >> DF1.write.mode(SaveMode.Overwrite).format("orc").insertInto("us_wm_supply_chain_otif_stg.test_tb2") >> >> >> With Best Regards, >> >> Dipayan Dev >> >