Why custom parquet format hive table execute "ParquetTableScan" physical plan, not "HiveTableScan"?

Xiaoyu Wang Fri, 16 Jan 2015 02:11:12 -0800

Hi all!

In the Spark SQL1.2.0.
I create a hive table with custom parquet inputformat and outputformat.
like this :
CREATE TABLE test(
  id string,
  msg string)
CLUSTERED BY (
  id)
SORTED BY (
  id ASC)
INTO 10 BUCKETS
ROW FORMAT SERDE
  '*com.a.MyParquetHiveSerDe*'
STORED AS INPUTFORMAT
  '*com.a.MyParquetInputFormat*'
OUTPUTFORMAT
  '*com.a.MyParquetOutputFormat*';


And the spark shell see the plan of "select * from test" is :

[== Physical Plan ==]
[!OutputFaker [id#5,msg#6]]
[ *ParquetTableScan* [id#12,msg#13], (ParquetRelation
hdfs://hadoop/user/hive/warehouse/test.db/test, Some(Configuration:
core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml,
yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml),
org.apache.spark.sql.hive.HiveContext@6d15a113, []), []]

*Not HiveTableScan*!!!
*So it dosn't execute my custom inputformat!*
Why? How can it execute my custom inputformat?

Thanks!

Why custom parquet format hive table execute "ParquetTableScan" physical plan, not "HiveTableScan"?

Reply via email to