[ 
https://issues.apache.org/jira/browse/IMPALA-9883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-9883:
----------------------------------
    Component/s: Frontend

> Fix stats extrapolation works for full ACID tables
> --------------------------------------------------
>
>                 Key: IMPALA-9883
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9883
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Frontend
>            Reporter: Zoltán Borók-Nagy
>            Priority: Major
>
> Full ACID tables have _delta_ and _delete delta_ files. Delta files contain 
> the inserted table data, while delete delta files contain tombstones that 
> denotes the deleted rows.
> Therefore the result of a SELECT contains the rows coming from the delta 
> files minus the rows whose tombstone is present in the delete delta files. 
> See Full ACID Milestone 4 for more details.
> Stats extrapolation uses file sampling. E.g. if the user issues COMPUTE STATS 
> table TABLESAMPLE (10); then Impala will randomly select files whose 
> aggregated byte size is at least 10% of the total byte size of the table 
> files. Unfortunately for ACID tables this method doesn't estimate the stats 
> correctly.
> To calculate the stats more precisely we need to change the sampling method 
> this way:
>  * select all delete delta files
>  * select some percentage (provided by TABLESAMPLE) of the delta files
>  * extrapolate stats using the total byte size of all delta files (not all 
> table files since those include the delete deltas)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to