My periodically running process writes data to a table over parquet files with
the configuration"spark.sql.sources.partitionOverwriteMode" = "dynamic"with the
following code:
if
(!tableExists) {
df.write
.mode(
"overwrite"
)
.partitionBy(
"partitionCol"
)
.format(
"parquet"
)
.saveAsTable(
"tablename"
)
}
else
{
df.write
.format(
"parquet"
)
.mode(
"overwrite"
)
.insertInto(
"table"
)
}
If the table doesn't exist and is created in the first clause, it works fine
and on the next run when the table does exist and the else clause runs it works
as expected.
However, when I create the table over existing parquet files either through a
hive session or usingspark.sql("CREATE TABLE...")and then run the process it
fails to write with the error:
"org.apache.spark.SparkException: Dynamic partition strict mode requires at
least one static partition column. To turn this off set
hive.exec.dynamic.partition.mode=nonstrict"
Adding this configuration to the spark conf solves the issue but I don't
understand why it is needed when creating the table through a command but isn't
needed when creating the table with saveAsTable.
Also, I don't understand how this configuration is relevant for spark.[From
what I've
read](https://cwiki.apache.org/confluence/display/hive/tutorial#Tutorial-Dynamic-PartitionInsert),
static partition here means we directly specify the partition to write into
instead of specifying the column to partition by. Is it even possible to do
such an insert in spark (as opposed to HiveQL)?
Spark 2.4, Hadoop 3.1
thanks