Hi, I am executing one query : “SELECT `event_date` as `event_date`,sum(`bookings`) as `bookings`,sum(`dealviews`) as `dealviews` FROM myTable WHERE `event_date` >= '2016-01-06' AND `event_date` <= '2016-04-02' GROUP BY `event_date` LIMIT 20000”
My table was created something like : CREATE TABLE myTable ( bookings DOUBLE , deal views INT ) STORED AS ORC or PARQUET PARTITION BY (event_date STRING) PARQUET take 9second of cumulative CPU ORC take 50second of cumulative CPU. For ORC I have tried to hiveContext.setConf(“Spark.Sql.Orc.FilterPushdown”,“true”) But it didn’t change anything I am missing something, or parquet is better for this type of query? I am using spark 1.6.0 with hive 1.1.0 thanks