Re: orc vs parquet aggregation, orc is really slow

Mich Talebzadeh Sat, 16 Apr 2016 00:32:59 -0700

Have you analysed statistics on the ORC table? How many rows are there?

Also send the outp of


desc formatted statistics <TABLE_NAME>

HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 16 April 2016 at 08:20, Maurin Lenglart <mau...@cuberonlabs.com> wrote:

> Hi,
> I am executing one query :
> “SELECT `event_date` as `event_date`,sum(`bookings`) as
> `bookings`,sum(`dealviews`) as `dealviews` FROM myTable WHERE  `event_date`
> >= '2016-01-06' AND `event_date` <= '2016-04-02' GROUP BY `event_date`
> LIMIT 20000”
>
> My table was created something like :
>   CREATE TABLE myTable (
>   bookings            DOUBLE
>   , deal views          INT
>   )
>    STORED AS ORC or PARQUET
>      PARTITION BY (event_date STRING)
>
> PARQUET take 9second of cumulative CPU
> ORC take 50second of cumulative CPU.
>
> For ORC I have tried to hiveContext.setConf(“Spark.Sql.Orc.FilterPushdown
> ”,“true”)
> But it didn’t change anything
>
> I am missing something, or parquet is better for this type of query?
>
> I am using spark 1.6.0 with hive 1.1.0
>
> thanks
>
>
>

Re: orc vs parquet aggregation, orc is really slow

Reply via email to