Spark 1.2

Data stored in parquet table (large number of rows)

Test 1

select a, sum(b), sum(c) from table


select a, sum(b), sum(c) from table  - "seed cache" First time slow since
loading cache ?
select a, sum(b), sum(c) from table  - Second time it should be faster as
it should be reading from cache, not HDFS. But it is slower than test1

Any thoughts? Should a different query be used to seed cache ?


Reply via email to