Hi
I am encountering issues while working with a REST-based catalog. My Spark
session is configured with a default catalog that uses the REST-based
implementation.
The SparkSession.catalog API does not function correctly with the
REST-based catalog. This issue has been tested and observed in Spark 3.4.
----------------------------------------------------------------------------------
${SPARK_HOME}/bin/spark-shell --master local[*]
--driver-memory 2g
--conf
spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider
--conf spark.sql.catalog.iceberg.uri=https://xx.xxx.xxxx.domain.com
--conf spark.sql.warehouse.dir=$SQL_WAREHOUSE_DIR
--conf spark.sql.defaultCatalog=iceberg
--conf spark.sql.catalog.iceberg=org.apache.iceberg.spark.SparkCatalog
--conf
spark.sql.catalog.iceberg.catalog-impl=org.apache.iceberg.rest.RESTCatalog \
scala> spark.catalog.currentCatalog
res1: String = iceberg
scala> spark.sql("select * from restDb.restTable").show
+---+----------+
| id| data|
+---+----------+
| 1|some_value|
+---+----------+
scala> spark.catalog.tableExists("restDb.restTable")
*res3: Boolean = true*
scala> spark.catalog.tableExists("restDb", "restTable")
*res4: Boolean = false*
----------------------------------------------------------------------------------
API spark.catalog.tableExists(String databaseName, String tableName)
is only meant to work with HMS based catalog (
https://github.com/apache/spark/blob/5a91172c019c119e686f8221bbdb31f59d3d7776/sql/core/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala#L224
)
spark.catalog.tableExists(String databaseName, String tableName)
is meant to work with hms and non-hms based catalogs
Suggested resolutions
1. API spark.catalog.tableExists(String databaseName, String tableName) to
throw runtime exception if session catalog is non-hms based catalog
2. Deprecrate HMS specific API in newer Spark release as Spark already have
API that can work with hms and non-hms based catalogs.
Thanks
Sunny