Why Parquet Predicate Pushdown doesn't work?

Xuelin Cao Tue, 06 Jan 2015 22:19:39 -0800

Hi,
       I'm testing parquet file format, and the predicate pushdown is a very 
useful feature for us.
       However, it looks like the predicate push down doesn't work after I set  
      sqlContext.sql("SET spark.sql.parquet.filterPushdown=true")        Here 
is my sql:       sqlContext.sql("select adId, adTitle  from ad where 
groupId=10113000").collect


       Then, I checked the amount of input data on the WEB UI. But the amount 
of input data is ALWAYS 80.2M regardless whether I turn the 
spark.sql.parquet.filterPushdown flag on or off.
       I'm not sure, if there is anything that I must do when generating the 
parquet file in order to make the predicate pushdown available. (Like ORC file, 
when creating the ORC file, I need to explicitly sort the field that will be 
used for predicate pushdown)
       Anyone have any idea?
       And, anyone knows the internal mechanism for parquet predicate pushdown?
       Thanks

Why Parquet Predicate Pushdown doesn't work?

Reply via email to