Table created with saveAsTable behaves differently than a table created with spark.sql("CREATE TABLE....)

krexos Sat, 21 Jan 2023 04:03:04 -0800

My periodically running process writes data to a table over parquet files with 
the configuration"spark.sql.sources.partitionOverwriteMode" = "dynamic"with the 
following code:


if

(!tableExists) {
  df.write
    .mode(

"overwrite"

)
    .partitionBy(

"partitionCol"

)
    .format(

"parquet"

)
    .saveAsTable(

"tablename"

)
}

else

{
  df.write
    .format(

"parquet"

)
    .mode(

"overwrite"

)
    .insertInto(

"table"

)
}

If the table doesn't exist and is created in the first clause, it works fine 
and on the next run when the table does exist and the else clause runs it works 
as expected.

However, when I create the table over existing parquet files either through a 
hive session or usingspark.sql("CREATE TABLE...")and then run the process it 
fails to write with the error:

"org.apache.spark.SparkException: Dynamic partition strict mode requires at 
least one static partition column. To turn this off set 
hive.exec.dynamic.partition.mode=nonstrict"

Adding this configuration to the spark conf solves the issue but I don't 
understand why it is needed when creating the table through a command but isn't 
needed when creating the table with saveAsTable.

Also, I don't understand how this configuration is relevant for spark.[From 
what I've 
read](https://cwiki.apache.org/confluence/display/hive/tutorial#Tutorial-Dynamic-PartitionInsert),
 static partition here means we directly specify the partition to write into 
instead of specifying the column to partition by. Is it even possible to do 
such an insert in spark (as opposed to HiveQL)?

Spark 2.4, Hadoop 3.1

thanks

Table created with saveAsTable behaves differently than a table created with spark.sql("CREATE TABLE....)

Reply via email to