question for pre-filter parquet file data

蔡自强(伏念) Tue, 27 Jan 2015 06:58:34 -0800

Hi  dear drill devloper,    Now we are deploy the 0.7 version drill for 
statistics analysis. I found that the parquet file store the column summary 
info in pageheader (like min,max,count and so on), but in the datareader these 
info seems not to be used for pre-filtering files. For example, when I search 
the records that attribute_A = 10, if the column's (min,max) =(1,9) , skip to 
scan the data seems the best choice. I want to check if drill will do this 
operation in analysis process.btw：In TableStatsCalculator.getRegionSizeInBytes 
method, if avgRowSizeInBytes is to large, the return value will be out of int 
range. So the code should be fixed like "return 
((long)avgRowSizeInBytes)*1024L*1024L".                                         
                                                                                
                                   Thanks&Regards

question for pre-filter parquet file data

Reply via email to