Have you analysed statistics on the ORC table? How many rows are there? Also send the outp of
desc formatted statistics <TABLE_NAME> HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 16 April 2016 at 08:20, Maurin Lenglart <mau...@cuberonlabs.com> wrote: > Hi, > I am executing one query : > “SELECT `event_date` as `event_date`,sum(`bookings`) as > `bookings`,sum(`dealviews`) as `dealviews` FROM myTable WHERE `event_date` > >= '2016-01-06' AND `event_date` <= '2016-04-02' GROUP BY `event_date` > LIMIT 20000” > > My table was created something like : > CREATE TABLE myTable ( > bookings DOUBLE > , deal views INT > ) > STORED AS ORC or PARQUET > PARTITION BY (event_date STRING) > > PARQUET take 9second of cumulative CPU > ORC take 50second of cumulative CPU. > > For ORC I have tried to hiveContext.setConf(“Spark.Sql.Orc.FilterPushdown > ”,“true”) > But it didn’t change anything > > I am missing something, or parquet is better for this type of query? > > I am using spark 1.6.0 with hive 1.1.0 > > thanks > > >