[ https://issues.apache.org/jira/browse/IMPALA-11414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zoltán Borók-Nagy resolved IMPALA-11414. ---------------------------------------- Fix Version/s: Impala 4.2.0 Resolution: Fixed > Off-by-one error in Parquet late materialization > ------------------------------------------------ > > Key: IMPALA-11414 > URL: https://issues.apache.org/jira/browse/IMPALA-11414 > Project: IMPALA > Issue Type: Bug > Components: Backend > Reporter: Zoltán Borók-Nagy > Assignee: Zoltán Borók-Nagy > Priority: Major > Fix For: Impala 4.2.0 > > > With PARQUET_LATE_MATERIALIZATION we can set the number of minimum > consecutive rows that if filtered out, we avoid materialization of rows in > other columns in parquet. > E.g. if PARQUET_LATE_MATERIALIZATION is 10, and in a filtered column we find > at least 10 consecutive rows that don't pass the predicates we avoid > materializing the corresponding rows in the other columns. > But due to an off-by-one error we actually only need > (PARQUET_LATE_MATERIALIZATION - 1) consecutive elements. This means if we set > PARQUET_LATE_MATERIALIZATION to one, then we need zero consecutive filtered > out elements which leads to a crash/DCHECK. The bug is in the > GetMicroBatches() algorithm when we produce the micro batches based on the > selected rows. > Setting PARQUET_LATE_MATERIALIZATION to 0 doesn't make sense so it shouldn't > be allowed. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org