Safe to write to parquet at the same time?

Philip Weaver Mon, 03 Aug 2015 19:39:18 -0700

I think this question applies regardless if I have two completely separate
Spark jobs or tasks on different machines, or two cores that are part of
the same task on the same machine.


If two jobs/tasks/cores/stages both save to the same parquet directory in
parallel like this:

df1.write.mode(SaveMode.Append).partitionBy(a, b).parquet(dir)

df2.write.mode(SaveMode.Append).partitionBy(a, b).parquet(dir)


Will the result be equivalent to this?

df1.unionAll(df2).write.mode(SaveMode.Append).partitionBy(a, b).parquet(dir)


What if we ensure that 'dir' does not exist first?

- Philip

Safe to write to parquet at the same time?

Reply via email to