Parallel write to different partitions

Shrikant Prasad Thu, 21 Sep 2023 11:33:58 -0700

Hi all,

We have multiple spark jobs running in parallel trying to write into same
hive table but each job writing into different partition. This was working
fine with Spark 2.3 and Hadoop 2.7.


But after upgrading to Spark 3.2 and Hadoop 3.2.2, these parallel jobs are
failing with FileNotFound exceptions for files under
/warehouse/db/table/temporary/0/ directory.

It seems earlier the temporary dir was created under the partition being
written but now its created directly under the table directory which is
causing concurrency issues with multiple jobs trying to cleanup the same
temporary directory.

Is there a way now to achieve parallel writes to different partitions of
same table? Also any insight into what caused the change in behavior of
temporary dir creation will be helpful.

Thanks and regards,
Shrikant

Parallel write to different partitions

Reply via email to