Hi,

Would anybody know how to get the following information from HiveContext given 
a Hive table name?

- partition key(s)
- table directory
- input/output format

I am new to Spark. And I have a couple tables created using Parquet data like:

CREATE EXTERNAL TABLE parquet_table (
COL1 string,
COL2 string,
COL3 string
)
ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
STORED AS 
INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat"
OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat"
LOCATION '/user/foo/parquet_src';

and some of the tables have partitions. In my Spark Java code, I am able to run 
queries using the HiveContext like:

SparkConf sparkConf = new SparkConf().setAppName("example");
JavaSparkContext ctx = new JavaSparkContext(sparkConf);
JavaHiveContext hiveCtx = new JavaHiveContext(ctx);
JavaSchemaRDD rdd = hiveCtx.sql("select * from parquet_table");

Now am I able to get the INPUTFORMAT, OUTPUTFORMAT, LOCATION, and in other 
cases partition key(s) programmatically through the HiveContext?

The only way I know (pardon my ignorance) is to parse from the SchemaRDD 
returned by hiveCtx.sql("describe extended parquet_table"); 

If anybody could shed some light on a better way, I would appreciate that. 
Thanks :)

-BC

Reply via email to