Thanks Michael.

On Thursday, October 2, 2014 8:41 PM, Michael Armbrust <mich...@databricks.com> 
wrote:
 


We actually leave all the DDL commands up to hive, so there is no programatic 
way to access the things you are looking for.


On Thu, Oct 2, 2014 at 5:17 PM, Banias <calvi...@yahoo.com.invalid> wrote:

Hi,
>
>
>Would anybody know how to get the following information from HiveContext given 
>a Hive table name?
>
>
>- partition key(s)
>- table directory
>- input/output format
>
>
>I am new to Spark. And I have a couple tables created using Parquet data like:
>
>
>CREATE EXTERNAL TABLE parquet_table (
>COL1 string,
>COL2 string,
>COL3 string
>)
>ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
>STORED AS 
>INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat"
>OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat"
>LOCATION '/user/foo/parquet_src';
>
>
>and some of the tables have partitions. In my Spark Java code, I am able to 
>run queries using the HiveContext like:
>
>
>SparkConf sparkConf = new SparkConf().setAppName("example");
>JavaSparkContext ctx = new JavaSparkContext(sparkConf);
>JavaHiveContext hiveCtx = new JavaHiveContext(ctx);
>JavaSchemaRDD rdd = hiveCtx.sql("select * from parquet_table");
>
>
>Now am I able to get the INPUTFORMAT, OUTPUTFORMAT, LOCATION, and in other 
>cases partition key(s) programmatically through the HiveContext?
>
>
>The only way I know (pardon my ignorance) is to parse from the SchemaRDD 
>returned by hiveCtx.sql("describe extended parquet_table"); 
>
>
>If anybody could shed some light on a better way, I would appreciate that. 
>Thanks :)
>
>
>-BC
>
>

Reply via email to