In Spark SQL, Parquet filter pushdown doesn’t cover |HiveTableScan| for
now. May I ask why do you prefer |HiveTableScan| rather than
|ParquetTableScan|?
Cheng
On 1/19/15 5:02 PM, Xiaoyu Wang wrote:
The *spark.sql.parquet.**filterPushdown=true *has been turned on. But
set *spark.sql.hive.**convertMetastoreParquet *to *false*. the first
parameter is lose efficacy!!!
2015-01-20 6:52 GMT+08:00 Yana Kadiyska <yana.kadiy...@gmail.com
<mailto:yana.kadiy...@gmail.com>>:
If you're talking about filter pushdowns for parquet files this
also has to be turned on explicitly. Try
*spark.sql.parquet.**filterPushdown=true . *It's off by default
On Mon, Jan 19, 2015 at 3:46 AM, Xiaoyu Wang <wangxy...@gmail.com
<mailto:wangxy...@gmail.com>> wrote:
Yes it works!
But the filter can't pushdown!!!
If custom parquetinputformat only implement the datasource API?
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala
2015-01-16 21:51 GMT+08:00 Xiaoyu Wang <wangxy...@gmail.com
<mailto:wangxy...@gmail.com>>:
Thanks yana!
I will try it!
在 2015年1月16日,20:51,yana <yana.kadiy...@gmail.com
<mailto:yana.kadiy...@gmail.com>> 写道:
I think you might need to set
spark.sql.hive.convertMetastoreParquet to false if I
understand that flag correctly
Sent on the new Sprint Network from my Samsung Galaxy S®4.
-------- Original message --------
From: Xiaoyu Wang
Date:01/16/2015 5:09 AM (GMT-05:00)
To: user@spark.apache.org <mailto:user@spark.apache.org>
Subject: Why custom parquet format hive table execute
"ParquetTableScan" physical plan, not "HiveTableScan"?
Hi all!
In the Spark SQL1.2.0.
I create a hive table with custom parquet inputformat and
outputformat.
like this :
CREATE TABLE test(
id string,
msg string)
CLUSTERED BY (
id)
SORTED BY (
id ASC)
INTO 10 BUCKETS
ROW FORMAT SERDE
'*com.a.MyParquetHiveSerDe*'
STORED AS INPUTFORMAT
'*com.a.MyParquetInputFormat*'
OUTPUTFORMAT
'*com.a.MyParquetOutputFormat*';
And the spark shell see the plan of "select * from test" is :
[== Physical Plan ==]
[!OutputFaker [id#5,msg#6]]
[ *ParquetTableScan* [id#12,msg#13], (ParquetRelation
hdfs://hadoop/user/hive/warehouse/test.db/test,
Some(Configuration: core-default.xml, core-site.xml,
mapred-default.xml, mapred-site.xml, yarn-default.xml,
yarn-site.xml, hdfs-default.xml, hdfs-site.xml),
org.apache.spark.sql.hive.HiveContext@6d15a113, []), []]
*Not HiveTableScan*!!!
*So it dosn't execute my custom inputformat!*
Why? How can it execute my custom inputformat?
Thanks!