Hi Navin,
I don’t think inline screenshots work on the mailing list so they are not showing up for me. I don’t think you have to do anything in Drill 1.17 to enable predicate pushdown for Parquet. 1 GB total dataset is really small. If that’s spread across multiple parquet files the row group is going to be tiny and performance will be poor. How many files do you have now? I would aim for 1-2 GB row groups for best Parquet performance. Maybe 512 MB if the computers building them have low RAM. Do all the parquet files have 100% identical schema? Can you post your query? - Raf From: Navin Bhawsar <[email protected]> Sent: Wednesday, April 29, 2020 12:35 PM To: [email protected] Cc: [email protected]; Navin Bhawsar <[email protected]> Subject: Parquet Predicate Push down not working Hi We are trying to do a simple where clause query with Predicate .Parquet files are created using python and stored on hdfs. Apache Drill version used is 1.17 . Below options are set as default required for Predicate Push Down Drill query is scanning directory with multiple parquet files (total size 1 GB). We are expecting if predicate push down works it will help reduce scan time which is currently 97 %. If Predicate push down works row group scan should only fetch 70,840 records instead of 14162187. Minor Fragment NUM_ROWGROUPS ROWGROUPS_PRUNED NUM_DICT_PAGE_LOADS NUM_DATA_PAGE_lOADS NUM_DATA_PAGES_DECODED NUM_DICT_PAGES_DECOMPRESSED NUM_DATA_PAGES_DECOMPRESSED TOTAL_DICT_PAGE_READ_BYTES TOTAL_DATA_PAGE_READ_BYTES TOTAL_DICT_DECOMPRESSED_BYTES TOTAL_DATA_DECOMPRESSED_BYTES TIME_DICT_PAGE_LOADS TIME_DATA_PAGE_LOADS TIME_DATA_PAGE_DECODE TIME_DICT_PAGE_DECODE TIME_DICT_PAGES_DECOMPRESSED TIME_DATA_PAGES_DECOMPRESSED TIME_DISK_SCAN_WAIT TIME_DISK_SCAN TIME_FIXEDCOLUMN_READ TIME_VARCOLUMN_READ TIME_PROCESS 01-00-04 7 0 77 0 77 77 77 0 0 7,147,852 8,884,071 598,070 0 97,822 11,440,739 2,081,514 17,694,740 598,070 0 112,108,259 703,103,096 815,245,307 01-01-04 6 0 66 0 66 66 66 0 0 2,115,860 4,316,153 1,778,468 0 144,320 3,665,957 775,403 8,693,618 1,778,468 0 105,066,657 776,807,232 882,070,408 01-02-04 6 0 66 0 66 66 66 0 0 6,835,560 8,630,174 337,404 0 100,190 10,876,145 1,970,521 11,789,061 337,404 0 102,833,433 655,338,696 758,203,357 01-03-04 6 0 66 0 66 66 66 0 0 2,242,112 4,516,183 1,586,562 0 164,398 3,827,371 877,814 8,604,307 1,586,562 0 112,745,628 758,634,132 871,586,588 01-04-04 6 0 66 2 66 66 64 0 1,420 5,407,178 7,175,446 2,216,935 3,181 74,956 8,754,425 1,650,970 11,241,636 2,216,935 0 97,180,713 668,249,966 765,461,684 01-05-04 6 0 66 1 66 66 65 0 92 1,378,260 3,595,638 3,394,196 1,571 204,833 2,726,005 1,357,297 6,843,717 3,394,196 0 150,560,569 704,154,215 854,928,393 01-06-04 6 0 66 0 66 66 66 0 0 4,748,302 6,547,215 471,679 0 114,270 7,739,335 1,537,805 10,571,215 471,679 0 97,392,926 667,056,499 764,478,811 01-07-04 6 0 68 0 66 64 66 180 0 769,746 3,128,730 292,603 0 130,814 1,574,574 425,133 6,563,457 286,300 0 168,501,325 716,135,483 884,850,308 01-08-04 6 0 66 0 66 66 66 0 0 8,356,637 9,264,223 582,946 0 101,103 13,332,669 2,422,705 13,340,100 582,946 0 109,932,913 691,400,457 801,374,949 01-09-04 6 0 66 2 66 66 64 0 133 1,453,953 2,953,546 19,563,820 1,920 149,257 2,553,666 632,461 5,886,238 19,563,820 0 81,854,819 557,612,832 639,664,370 01-10-04 6 0 66 0 66 66 66 0 0 6,634,676 8,081,684 Please advise if there is any specific options required to enable predicate push down. Also we expect Filter should filter out records but its done later by SELECTION_VECTOR_REMOVER operator. There is not enough details on documentation site ,when this operation is triggered. Thanks, Navin
smime.p7s
Description: S/MIME cryptographic signature
