Hi, I want to make sure that the cache table indeed would accelerate sql queries. Here is one of my use case : impala table size : 24.59GB, no partitions, with about 1 billion+ rows. I use sqlContext.sql to run queries over this table and try to do cache and uncache command to see if there is any performance disparity. I ran the following query : select * from video1203 where id > 10 and id < 20 and added_year != 1989 I can see the following results :
1 If I did not run cache table and just ran sqlContext.sql(), I can see the above query run about 25 seconds. 2 If I firstly run sqlContext.cacheTable("video1203"), the query runs super slow and would cause driver OOM exception, but I can get final results with about running 9 minuts. Would any expert can explain this for me ? I can see that cacheTable cause OOM just because the in-memory columnar storage cannot hold the 24.59GB+ table size into memory. But why the performance is so different and even so bad ? Best, Sun. fightf...@163.com