Re: Why Parquet Predicate Pushdown doesn't work?

2015-01-19 Thread Jerry Lam
Hi guys, Does this issue affect 1.2.0 only or all previous releases as well? Best Regards, Jerry On Thu, Jan 8, 2015 at 1:40 AM, Xuelin Cao wrote: > > Yes, the problem is, I've turned the flag on. > > One possible reason for this is, the parquet file supports "predicate > pushdown" by settin

Re: Why Parquet Predicate Pushdown doesn't work?

2015-01-17 Thread Yana Kadiyska
Just wondering if you've made any progress on this -- I'm having the same issue. My attempts to help myself are documented here http://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAJ4HpHFVKvdNgKes41DvuFY=+f_nTJ2_RT41+tadhNZx=bc...@mail.gmail.com%3E . I don't believe I have the valu

Re: Why Parquet Predicate Pushdown doesn't work?

2015-01-07 Thread Xuelin Cao
Yes, the problem is, I've turned the flag on. One possible reason for this is, the parquet file supports "predicate pushdown" by setting statistical min/max value of each column on parquet blocks. If in my test, the "groupID=10113000" is scattered in all parquet blocks, then the predicate pushdown

Re: Why Parquet Predicate Pushdown doesn't work?

2015-01-07 Thread Cody Koeninger
But Xuelin already posted in the original message that the code was using SET spark.sql.parquet.filterPushdown=true On Wed, Jan 7, 2015 at 12:42 AM, Daniel Haviv wrote: > Quoting Michael: > Predicate push down into the input format is turned off by default because > there is a bug in the curren

Re: Why Parquet Predicate Pushdown doesn't work?

2015-01-06 Thread Daniel Haviv
Quoting Michael: Predicate push down into the input format is turned off by default because there is a bug in the current parquet library that null pointers when there are full row groups that are null. https://issues.apache.org/jira/browse/SPARK-4258 You can turn it on if you want: http://spa

Why Parquet Predicate Pushdown doesn't work?

2015-01-06 Thread Xuelin Cao
Hi,        I'm testing parquet file format, and the predicate pushdown is a very useful feature for us.        However, it looks like the predicate push down doesn't work after I set        sqlContext.sql("SET spark.sql.parquet.filterPushdown=true")        Here is my sql:       sqlContext.sql("