Hi, I would like to create a hive table on top a existent parquet file as described here: https://databricks.com/blog/2015/03/24/spark-sql-graduates-from-alpha-in-spark-1-3.html
Due network restrictions, I need to store the metadata definition in a different path than '/user/hive/warehouse', so I first set a new database on my own HDFS dir: CREATE DATABASE foo_db LOCATION '/user/foo'; USE foo_db; And then I run the following query: CREATE TABLE mytable_parquet USING parquet OPTIONS (path "/user/foo/data.parquet") The problem is that SparkSQL is not using the same database defined the in shell context, but the default metastore instead of: ---------------------------- > CREATE TABLE mytable_parquet USING parquet OPTIONS (path "/user/foo/data.parquet"); 15/05/08 20:42:21 INFO metastore.HiveMetaStore: 0: get_table : *db=foo_db* tbl=mytable_parquet 15/05/08 20:42:21 INFO HiveMetaStore.audit: ugi=foo ip=unknown-ip-addr cmd=get_table : db=foo_db tbl=mytable_parquet 15/05/08 20:42:21 INFO metastore.HiveMetaStore: 0: create_table: Table(tableName:mytable_parquet, *dbName:default,* owner:foo, createTime:1431117741, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:col, type:array<string>, comment:from deserializer)], location:null, inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe, parameters:{serialization.format=1, path=/user/foo/data.parquet}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{})), partitionKeys:[], parameters:{EXTERNAL=TRUE, spark.sql.sources.provider=parquet}, viewOriginalText:null, viewExpandedText:null, tableType:EXTERNAL_TABLE) 15/05/08 20:42:21 INFO HiveMetaStore.audit: ugi=foo ip=unknown-ip-addr cmd=create_table: Table(tableName:mytable_parquet, dbName:default, owner:foo, createTime:1431117741, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:col, type:array<string>, comment:from deserializer)], location:null, inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe, parameters:{serialization.format=1, path=/user/foo/data.parquet}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{})), partitionKeys:[], parameters:{EXTERNAL=TRUE, spark.sql.sources.provider=parquet}, viewOriginalText:null, viewExpandedText:null, tableType:EXTERNAL_TABLE) 15/05/08 20:42:21 ERROR hive.log: Got exception: org.apache.hadoop.security.AccessControlException Permission denied: user=foo, access=WRITE, inode="/user/hive/warehouse":hive:grp_gdoop_hdfs:drwxr-xr-x ---------------------------- The permission error above happens because my linux user does not have write access on the default metastore path. I can workaround this issue if I use CREATE TEMPORARY TABLE and have no metadata written on disk. I would like to know if I am doing anything wrong here and if there is any additional property I can use to force the database/metastore_dir I need to write on. Thanks, Carlos -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/CREATE-TABLE-ignores-database-when-using-PARQUET-option-tp22824.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org