Hi, We have upgraded Spark from 2.4.x to 3.3.1 recently and managed table creation while writing dataframe as saveAsTable failed with below error.
Can not create the managed table(`<table name>`) The associated location('hdfs:<table path>') already exists. On high level our code does below before writing dataframe as table: sparkSession.sql(s"DROP TABLE IF EXISTS $hiveTableName PURGE") mydataframe.write.mode(SaveMode.Overwrite).saveAsTable(hiveTableName) The above code works with Spark 2 because of spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation which is deprecated in Spark 3. The table is dropped and purged before writing the dataframe. I expected dataframe write shouldn't complain that the path already exists. After digging further, I noticed there is `_tempory` folder present in the hdfs table path. dfs -ls /apps/hive/warehouse/<table-path>/ Found 1 items drwxr-xr-x - hadoop hdfsadmingroup 0 2023-06-23 04:45 /apps/hive/warehouse/<table-path>/_temporary [root@ip-10-121-107-90 bin]# hdfs dfs -ls /apps/hive/warehouse/<table-path>/_temporary Found 1 items drwxr-xr-x - hadoop hdfsadmingroup 0 2023-06-23 04:45 /apps/hive/warehouse/<table-path>/_temporary/0 [root@ip-10-121-107-90 bin]# hdfs dfs -ls /apps/hive/warehouse/<table-path>/_temporary/0 Found 1 items drwxr-xr-x - hadoop hdfsadmingroup 0 2023-06-23 04:45 /apps/hive/warehouse/<table-path>/_temporary/0/_temporary Is it because of task failures ? Is there a way to workaround this issue ? Thanks