[ https://issues.apache.org/jira/browse/IMPALA-10755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aman Sinha reassigned IMPALA-10755: ----------------------------------- Assignee: Aman Sinha > Wrong results for a query with predicate on an analytic function > ---------------------------------------------------------------- > > Key: IMPALA-10755 > URL: https://issues.apache.org/jira/browse/IMPALA-10755 > Project: IMPALA > Issue Type: Bug > Components: Frontend > Affects Versions: Impala 4.0 > Reporter: Aman Sinha > Assignee: Aman Sinha > Priority: Critical > Labels: wrongresults > > In the master branch, the following query with equality predicate on analytic > function produces 13 rows whereas the correct result is 3 rows (as verified > with Hive): > {noformat} > use tpch; > select * > FROM ( > SELECT > l_quantity, l_linenumber, > nr_pvp, > max(nr_pvp) over (partition BY l_quantity) max_nr_pvp > FROM ( > SELECT > l_quantity, l_linenumber, > count(1) AS nr_pvp > FROM > lineitem, partsupp > where l_partkey = ps_partkey > and ps_suppkey = 10 and l_quantity in (1, 2) > GROUP BY l_quantity, l_linenumber > ) a > ) b > WHERE nr_pvp = max_nr_pvp > order by 1, 2, 3, 4; > {noformat} > Wrong results: > {noformat} > +------------+--------------+--------+------------+ > > | l_quantity | l_linenumber | nr_pvp | max_nr_pvp | > +------------+--------------+--------+------------+ > | 1.00 | 1 | 11 | 12 | > | 1.00 | 2 | 12 | 12 | > | 1.00 | 3 | 10 | 12 | > | 1.00 | 4 | 5 | 12 | > | 1.00 | 5 | 4 | 12 | > | 1.00 | 6 | 5 | 12 | > | 1.00 | 7 | 2 | 12 | > | 2.00 | 1 | 12 | 12 | > | 2.00 | 2 | 12 | 12 | > | 2.00 | 3 | 11 | 12 | > | 2.00 | 4 | 8 | 12 | > | 2.00 | 5 | 3 | 12 | > | 2.00 | 6 | 2 | 12 | > +------------+--------------+--------+------------+ > Fetched 13 row(s) in 1.36s > {noformat} > Expected results (verified with Hive): > {noformat} > +---------------+-----------------+-----------+---------------+ > | b.l_quantity | b.l_linenumber | b.nr_pvp | b.max_nr_pvp | > +---------------+-----------------+-----------+---------------+ > | 1.00 | 2 | 12 | 12 | > | 2.00 | 1 | 12 | 12 | > | 2.00 | 2 | 12 | 12 | > +---------------+-----------------+-----------+---------------+ > {noformat} > Explain plan for Impala shows that the equality predicate 'nr_pvp = > max_nr_pvp' which is supposed to be applied on top of the analytic operator > is missing: > {noformat} > +---------------------------------------------------------------------------+ > | Explain String | > +---------------------------------------------------------------------------+ > | Max Per-Host Resource Reservation: Memory=42.88MB Threads=7 | > | Per-Host Resource Estimates: Memory=483MB | > | | > | PLAN-ROOT SINK | > | | | > | 11:MERGING-EXCHANGE [UNPARTITIONED] | > | | order by: l_quantity ASC, l_linenumber ASC, nr_pvp ASC, max_nr_pvp ASC | > | | | > | 06:SORT | > | | order by: l_quantity ASC, l_linenumber ASC, nr_pvp ASC, max_nr_pvp ASC | > | | row-size=28B cardinality=357 | > | | | > | 05:ANALYTIC | > | | functions: max(count(*)) | > | | partition by: l_quantity | > | | row-size=28B cardinality=357 | > | | | > | 04:SORT | > | | order by: l_quantity ASC NULLS LAST | > | | row-size=20B cardinality=357 | > | | | > | 10:EXCHANGE [HASH(l_quantity)] | > | | | > | 09:AGGREGATE [FINALIZE] | > | | output: count:merge(*) | > | | group by: l_quantity, l_linenumber | > | | row-size=20B cardinality=357 | > | | | > | 08:EXCHANGE [HASH(l_quantity,l_linenumber)] | > | | | > | 03:AGGREGATE [STREAMING] | > | | output: count(*) | > | | group by: l_quantity, l_linenumber | > | | row-size=20B cardinality=357 | > | | | > | 02:HASH JOIN [INNER JOIN, BROADCAST] | > | | hash predicates: l_partkey = ps_partkey | > | | runtime filters: RF000 <- ps_partkey | > | | row-size=36B cardinality=2.45K | > | | | > | |--07:EXCHANGE [BROADCAST] | > | | | | > | | 01:SCAN HDFS [tpch.partsupp] | > | | HDFS partitions=1/1 files=1 size=112.71MB | > | | predicates: ps_suppkey = 10 | > | | row-size=16B cardinality=82 | > | | | > | 00:SCAN HDFS [tpch.lineitem] | > | HDFS partitions=1/1 files=1 size=718.94MB | > | predicates: l_quantity IN (1, 2) | > | runtime filters: RF000 -> l_partkey | > | row-size=20B cardinality=235.34K | > +---------------------------------------------------------------------------+ > {noformat} > This is likely a regression since based on some internal testing this was > working correctly on an earlier version. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org