[jira] [Work started] (IMPALA-11802) Optimize count(*) queries for Iceberg V2 tables

Li Penglin (Jira) Thu, 09 Feb 2023 02:59:05 -0800


     [ 
https://issues.apache.org/jira/browse/IMPALA-11802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Work on IMPALA-11802 started by Li Penglin.
-------------------------------------------
> Optimize count(*) queries for Iceberg V2 tables
> -----------------------------------------------
>
>                 Key: IMPALA-11802
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11802
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>            Reporter: Zoltán Borók-Nagy
>            Assignee: Li Penglin
>            Priority: Major
>              Labels: impala-iceberg
>
> Simple {{SELECT count( * ) FROM ice_v2_tbl;}} could be optimized.
> At first we need to investigate if the following is true:
> If a V2 table only has position delete files, then the cardinality is
> {noformat}
> Cardinality(data files) - Cardinality(delete files)
> {noformat}
> If this is true, then we can answer count( * ) queries via a query rewrite 
> similarly to what we do for V1 tables: IMPALA-11279
> If the above is not true, we can still optimize count( * ) queries by:
> {noformat}
>         SUM
>          |
>      UNION ALL
>       /     \
>      /       \
>     /         \
> COUNT(*)     COUNT(*)
>   /                \
> SCAN             ANTI JOIN
> data files         /      \
> without           /        \
> deletes       SCAN         SCAN
>               data files   delete files
>               with deletes
> {noformat}
> The SCAN operator with "data files without deletes" could benefit from count( 
> * ) optimization (they would only need to read file metadata). In the common 
> case (when there are few deletes) this SCAN is in charge of scanning the vast 
> majority of data files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Work started] (IMPALA-11802) Optimize count(*) queries for Iceberg V2 tables

Reply via email to