Hi Folks, I wanted to check why spark doesn't create staging dir while doing an insertInto on partitioned tables. I'm running below example code – ``` spark.sql("set hive.exec.dynamic.partition.mode=nonstrict")
val rdd = sc.parallelize(Seq((1, 5, 1), (2, 1, 2), (4, 4, 3))) val df = spark.createDataFrame(rdd) df.write.insertInto("testing_table") // testing table is partitioned on "_1" ``` In this scenario FileOutputCommitter considers table path as output path and creates temporary folders like `<some_location>/testing_table/_temporary/0` and then moves them to the partition location when the job commit happens. But in-case if multiple parallel apps are inserting into the same partition, this can cause race condition issues while deleting the `_temporary` dir. Ideally for each app there should be a unique staging dir where the job should write its output. Is there any specific reason for this? or am i missing something here? Thanks for your time and assistance regarding this! Kind regards Sanskar