[ https://issues.apache.org/jira/browse/HIVE-5873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864552#comment-15864552 ]
Julian Hyde commented on HIVE-5873: ----------------------------------- This is fixed in Calcite as part of CALCITE-365. It's worth re-evaluating when Hive picks up Calcite 1.12. By the way, it's ambiguous in [~rhbutani]'s description above, but the query should return two rows (yes, including PNum=8). I confirmed on Postgres: {noformat} > create table Part (PNum int, OrderOnHand int); CREATE TABLE > insert into Part values (3,6),(10,1),(8,0); INSERT 0 3 > create table Supply (PNum int, Qty int); CREATE TABLE > insert into Supply values (3,4),(3,2),(10,1); INSERT 0 3 > select pnum from Part p where orderOnHand in (select count(*) from Supply s where s.pnum = p.pnum ); pnum ------ 10 8 (2 rows) {noformat} > SubQuery: In subquery Count Bug > ------------------------------- > > Key: HIVE-5873 > URL: https://issues.apache.org/jira/browse/HIVE-5873 > Project: Hive > Issue Type: Bug > Components: Query Processor > Reporter: Harish Butani > > This is from the Optimization of Nested SQl Queries Revisited paper: > http://dl.acm.org/citation.cfm?id=38723 > Consider Part table having: > {noformat} > PNum OrderOnHand > -------- ------------------ > 3 6 > 10 1 > 8 0 > {noformat} > Supply table having: > {noformat} > PNum Qty > 3 4 > 3 2 > 10 1 > {noformat} > The query: > {noformat} > select pnum > from parts p > where orderOnHand > in (select count(*) from supply s > where s.pnum = p.pnum > ) > {noformat} > should return the row with PNum=8. > But a transformation to a semi-join would eliminate this row, as there are no > rows in supply table with PNum=8. > AS shown in the paper the soln is to transform to: > {noformat} > select pnum > from parts p semijoin > (select p1.pnum, count(*) as c > from (select distinct pnum from parts) p1 join supply s > where s.pnum = p1.pnum > ) sq on p.pnum = sq.pnum and p.orderOnHand = sq.c > {noformat} > The additional distinct query within the SubQuery is to handle duplicates in > the outer query on the joining columns. -- This message was sent by Atlassian JIRA (v6.3.15#6346)