I think this question applies regardless if I have two completely separate Spark jobs or tasks on different machines, or two cores that are part of the same task on the same machine.
If two jobs/tasks/cores/stages both save to the same parquet directory in parallel like this: df1.write.mode(SaveMode.Append).partitionBy(a, b).parquet(dir) df2.write.mode(SaveMode.Append).partitionBy(a, b).parquet(dir) Will the result be equivalent to this? df1.unionAll(df2).write.mode(SaveMode.Append).partitionBy(a, b).parquet(dir) What if we ensure that 'dir' does not exist first? - Philip