I think this question applies regardless if I have two completely separate
Spark jobs or tasks on different machines, or two cores that are part of
the same task on the same machine.

If two jobs/tasks/cores/stages both save to the same parquet directory in
parallel like this:

df1.write.mode(SaveMode.Append).partitionBy(a, b).parquet(dir)

df2.write.mode(SaveMode.Append).partitionBy(a, b).parquet(dir)


Will the result be equivalent to this?

df1.unionAll(df2).write.mode(SaveMode.Append).partitionBy(a, b).parquet(dir)


What if we ensure that 'dir' does not exist first?

- Philip

Reply via email to