Hi,
I am executing one query :
“SELECT `event_date` as `event_date`,sum(`bookings`) as
`bookings`,sum(`dealviews`) as `dealviews` FROM myTable WHERE `event_date` >=
'2016-01-06' AND `event_date` <= '2016-04-02' GROUP BY `event_date` LIMIT 20000”
My table was created something like :
CREATE TABLE myTable (
bookings DOUBLE
, deal views INT
)
STORED AS ORC or PARQUET
PARTITION BY (event_date STRING)
PARQUET take 9second of cumulative CPU
ORC take 50second of cumulative CPU.
For ORC I have tried to
hiveContext.setConf(“Spark.Sql.Orc.FilterPushdown”,“true”)
But it didn’t change anything
I am missing something, or parquet is better for this type of query?
I am using spark 1.6.0 with hive 1.1.0
thanks