[ https://issues.apache.org/jira/browse/IMPALA-9883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Armstrong updated IMPALA-9883: ---------------------------------- Component/s: Frontend > Fix stats extrapolation works for full ACID tables > -------------------------------------------------- > > Key: IMPALA-9883 > URL: https://issues.apache.org/jira/browse/IMPALA-9883 > Project: IMPALA > Issue Type: Sub-task > Components: Frontend > Reporter: Zoltán Borók-Nagy > Priority: Major > > Full ACID tables have _delta_ and _delete delta_ files. Delta files contain > the inserted table data, while delete delta files contain tombstones that > denotes the deleted rows. > Therefore the result of a SELECT contains the rows coming from the delta > files minus the rows whose tombstone is present in the delete delta files. > See Full ACID Milestone 4 for more details. > Stats extrapolation uses file sampling. E.g. if the user issues COMPUTE STATS > table TABLESAMPLE (10); then Impala will randomly select files whose > aggregated byte size is at least 10% of the total byte size of the table > files. Unfortunately for ACID tables this method doesn't estimate the stats > correctly. > To calculate the stats more precisely we need to change the sampling method > this way: > * select all delete delta files > * select some percentage (provided by TABLESAMPLE) of the delta files > * extrapolate stats using the total byte size of all delta files (not all > table files since those include the delete deltas) -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org