Kent Yao created SPARK-34558:
--------------------------------

             Summary: warehouse path should be resolved ahead of populating and 
use
                 Key: SPARK-34558
                 URL: https://issues.apache.org/jira/browse/SPARK-34558
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.0.2, 3.1.2
            Reporter: Kent Yao


Currently, the warehouse path gets fully qualified in the caller side for 
creating a database, table, partition, etc. An unqualified path is populated 
into Spark and Hadoop confs, which leads to inconsistent API behaviors.  We 
should make it qualified ahead.


When the value is a relative path `spark.sql.warehouse.dir=lakehouse`, for 
example.

If the default database is absent at runtime, the app fails with

{code:java}
Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: 
Relative path in absolute URI: file:./datalake
        at org.apache.hadoop.fs.Path.initialize(Path.java:263)
        at org.apache.hadoop.fs.Path.<init>(Path.java:254)
        at 
org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:133)
        at 
org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:137)
        at 
org.apache.hadoop.hive.metastore.Warehouse.getWhRoot(Warehouse.java:150)
        at 
org.apache.hadoop.hive.metastore.Warehouse.getDefaultDatabasePath(Warehouse.java:163)
        at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB_core(HiveMetaStore.java:636)
        at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:655)
        at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:431)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
        at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
        at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:79)
        ... 73 more
{code}

If the default database is present at runtime, the app can work with it, and if 
we create a database, it gets fully qualified, for example


{code:sql}
spark-sql> create database test2 location 'datalake';
21/02/26 21:52:57 WARN ObjectStore: Failed to get database test2, returning 
NoSuchObjectException
Time taken: 0.052 seconds
spark-sql> desc database test;
Database Name   test
Comment
Location        
file:/Users/kentyao/Downloads/spark/spark-3.2.0-SNAPSHOT-bin-20210226/datalake/test.db
Owner   kentyao
Time taken: 0.023 seconds, Fetched 4 row(s)
{code}

Another thing is that the log becomes nubilous, for example.

{code:java}
21/02/27 13:54:17 INFO SharedState: Setting hive.metastore.warehouse.dir 
('null') to the value of spark.sql.warehouse.dir ('datalake').
21/02/27 13:54:17 INFO SharedState: Warehouse path is 'datalake'.
{code}







--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to