Hi, Would anybody know how to get the following information from HiveContext given a Hive table name?
- partition key(s) - table directory - input/output format I am new to Spark. And I have a couple tables created using Parquet data like: CREATE EXTERNAL TABLE parquet_table ( COL1 string, COL2 string, COL3 string ) ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat" OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat" LOCATION '/user/foo/parquet_src'; and some of the tables have partitions. In my Spark Java code, I am able to run queries using the HiveContext like: SparkConf sparkConf = new SparkConf().setAppName("example"); JavaSparkContext ctx = new JavaSparkContext(sparkConf); JavaHiveContext hiveCtx = new JavaHiveContext(ctx); JavaSchemaRDD rdd = hiveCtx.sql("select * from parquet_table"); Now am I able to get the INPUTFORMAT, OUTPUTFORMAT, LOCATION, and in other cases partition key(s) programmatically through the HiveContext? The only way I know (pardon my ignorance) is to parse from the SchemaRDD returned by hiveCtx.sql("describe extended parquet_table"); If anybody could shed some light on a better way, I would appreciate that. Thanks :) -BC