[ 
https://issues.apache.org/jira/browse/IMPALA-10755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Sinha reassigned IMPALA-10755:
-----------------------------------

    Assignee: Aman Sinha

> Wrong results for a query with predicate on an analytic function
> ----------------------------------------------------------------
>
>                 Key: IMPALA-10755
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10755
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 4.0
>            Reporter: Aman Sinha
>            Assignee: Aman Sinha
>            Priority: Critical
>              Labels: wrongresults
>
> In the master branch, the following query with equality predicate on analytic 
> function produces 13 rows whereas the correct result is 3 rows (as verified 
> with Hive):
> {noformat}
> use tpch;
> select *
>   FROM (
>     SELECT
>       l_quantity, l_linenumber,
>       nr_pvp,
>       max(nr_pvp) over (partition BY l_quantity) max_nr_pvp
>     FROM (
>       SELECT
>         l_quantity, l_linenumber,
>         count(1) AS nr_pvp
>       FROM
>         lineitem, partsupp
>         where l_partkey       = ps_partkey
>         and ps_suppkey = 10 and  l_quantity in (1, 2)
>         GROUP BY l_quantity, l_linenumber
>     ) a
>   ) b
>   WHERE nr_pvp = max_nr_pvp
>   order by 1, 2, 3, 4;
> {noformat}
> Wrong results:
> {noformat}
> +------------+--------------+--------+------------+                           
>                            
> | l_quantity | l_linenumber | nr_pvp | max_nr_pvp |
> +------------+--------------+--------+------------+
> | 1.00       | 1            | 11     | 12         |
> | 1.00       | 2            | 12     | 12         |
> | 1.00       | 3            | 10     | 12         |
> | 1.00       | 4            | 5      | 12         |
> | 1.00       | 5            | 4      | 12         |
> | 1.00       | 6            | 5      | 12         |
> | 1.00       | 7            | 2      | 12         |
> | 2.00       | 1            | 12     | 12         |
> | 2.00       | 2            | 12     | 12         |
> | 2.00       | 3            | 11     | 12         |
> | 2.00       | 4            | 8      | 12         |
> | 2.00       | 5            | 3      | 12         |
> | 2.00       | 6            | 2      | 12         |
> +------------+--------------+--------+------------+
> Fetched 13 row(s) in 1.36s
> {noformat}
> Expected results (verified with Hive):
> {noformat}
> +---------------+-----------------+-----------+---------------+
> | b.l_quantity  | b.l_linenumber  | b.nr_pvp  | b.max_nr_pvp  |
> +---------------+-----------------+-----------+---------------+
> | 1.00          | 2               | 12        | 12            |
> | 2.00          | 1               | 12        | 12            |
> | 2.00          | 2               | 12        | 12            |
> +---------------+-----------------+-----------+---------------+
> {noformat}
> Explain plan for Impala shows that the equality predicate 'nr_pvp = 
> max_nr_pvp' which is supposed to be applied on top of the analytic operator 
> is missing:
> {noformat}
> +---------------------------------------------------------------------------+
> | Explain String                                                            |
> +---------------------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=42.88MB Threads=7               |
> | Per-Host Resource Estimates: Memory=483MB                                 |
> |                                                                           |
> | PLAN-ROOT SINK                                                            |
> | |                                                                         |
> | 11:MERGING-EXCHANGE [UNPARTITIONED]                                       |
> | |  order by: l_quantity ASC, l_linenumber ASC, nr_pvp ASC, max_nr_pvp ASC |
> | |                                                                         |
> | 06:SORT                                                                   |
> | |  order by: l_quantity ASC, l_linenumber ASC, nr_pvp ASC, max_nr_pvp ASC |
> | |  row-size=28B cardinality=357                                           |
> | |                                                                         |
> | 05:ANALYTIC                                                               |
> | |  functions: max(count(*))                                               |
> | |  partition by: l_quantity                                               |
> | |  row-size=28B cardinality=357                                           |
> | |                                                                         |
> | 04:SORT                                                                   |
> | |  order by: l_quantity ASC NULLS LAST                                    |
> | |  row-size=20B cardinality=357                                           |
> | |                                                                         |
> | 10:EXCHANGE [HASH(l_quantity)]                                            |
> | |                                                                         |
> | 09:AGGREGATE [FINALIZE]                                                   |
> | |  output: count:merge(*)                                                 |
> | |  group by: l_quantity, l_linenumber                                     |
> | |  row-size=20B cardinality=357                                           |
> | |                                                                         |
> | 08:EXCHANGE [HASH(l_quantity,l_linenumber)]                               |
> | |                                                                         |
> | 03:AGGREGATE [STREAMING]                                                  |
> | |  output: count(*)                                                       |
> | |  group by: l_quantity, l_linenumber                                     |
> | |  row-size=20B cardinality=357                                           |
> | |                                                                         |
> | 02:HASH JOIN [INNER JOIN, BROADCAST]                                      |
> | |  hash predicates: l_partkey = ps_partkey                                |
> | |  runtime filters: RF000 <- ps_partkey                                   |
> | |  row-size=36B cardinality=2.45K                                         |
> | |                                                                         |
> | |--07:EXCHANGE [BROADCAST]                                                |
> | |  |                                                                      |
> | |  01:SCAN HDFS [tpch.partsupp]                                           |
> | |     HDFS partitions=1/1 files=1 size=112.71MB                           |
> | |     predicates: ps_suppkey = 10                                         |
> | |     row-size=16B cardinality=82                                         |
> | |                                                                         |
> | 00:SCAN HDFS [tpch.lineitem]                                              |
> |    HDFS partitions=1/1 files=1 size=718.94MB                              |
> |    predicates: l_quantity IN (1, 2)                                       |
> |    runtime filters: RF000 -> l_partkey                                    |
> |    row-size=20B cardinality=235.34K                                       |
> +---------------------------------------------------------------------------+
> {noformat}
> This is likely a regression since based on some internal testing this was 
> working correctly on an earlier version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to