SQL group by on Parquet table slower when table cached

Manoj Samel Fri, 06 Feb 2015 16:42:16 -0800

Spark 1.2

Data stored in parquet table (large number of rows)


Test 1

select a, sum(b), sum(c) from table

Test

sqlContext.cacheTable()
select a, sum(b), sum(c) from table  - "seed cache" First time slow since
loading cache ?
select a, sum(b), sum(c) from table  - Second time it should be faster as
it should be reading from cache, not HDFS. But it is slower than test1

Any thoughts? Should a different query be used to seed cache ?

Thanks,

SQL group by on Parquet table slower when table cached

Reply via email to